• Nenhum resultado encontrado

Noncognitive skills assessment can be improved with innovative new measures

N/A
N/A
Protected

Academic year: 2019

Share "Noncognitive skills assessment can be improved with innovative new measures"

Copied!
25
0
0

Texto

(1)

Noncognitive skills assessment

can be improved with innovative

new measures

Patrick Kyllonen

Educational Testing Service

Princeton, New Jersey, USA

High Level Policy Forum

Skills for Social Progress

III. Researchers’ Forum: “Measuring Skills that Matter”

Sao Paolo, Brazil

March 25, 2014

(2)

Why alternative measures for

noncognitive assessment?

Self-rating scales are very useful as is

most of

what we know about noncognitive skills is based

on such scales!

But there are problems with these scales

Socially desirable responding

(wanting to look good)

Reference group bias

(who you compare to)

Response style bias

(e.g., extreme responses, modesty)

Cross-cultural comparability

(to compare countries x and y)

(3)

There are methods to address

these problems

Situational judgment tests

Behaviorally anchored rating scales

Performance measures

Ratings by others (teachers, parents)

Forced-choice assessments

Anchoring vignettes

But I’ll focus

on these two

I’ll say

something

about this

(4)

Others’ Ratings

(teachers, parents, friends)

Psychologists’ ratings

of 18-year-

olds’ noncognitive

skills were comparable to or more powerful than IQ in

predicting earnings, employment, and chronic

unemployment 20 years later

(Lindqvist & Vestman, 2011)

Teachers’ ratings

of 8

th

graders’ misbehavior (5

-item

checklist) were comparable to or better than

achievement tests in predicting educational attainment

and earnings 20 years later

(Segal, 2012)

Others ratings

add to & are better than self-ratings in

predicting academic achievement & job performance

(Connelly & Ones, 2010; Oh, Wang, Mount, 2011)

(5)

Copyright © 2014 Educational Testing Service. All rights reserved.

(6)

Single Statements Rating Scale

Drasgow, Stark, Chernyshenko, Nye, Hulin, & White (2012).

Please indicate your answer to each item by clicking on the appropriate circle

Str

ong

ly

di

sag

ree

Di

sa

gr

ee

Ag

ree

Str

ongl

y

ag

ree

1.

I keep my promises

2. I am generally pretty forgiving

For each pair of statements please click on the one that is most like you

1.

I keep my promises

2.

I am generally pretty forgiving

(7)

Forced Choice vs. Single

Statements

Forced-choice shows higher validities vs. single

statements

PISA 2012; Brown & Bartram (2009); Bartram (2013)

For example, correlation between conscientiousness and

school and job performance:

Salgado & Táuriz (2012)

Forced-choice:

r

= .40

Single Statement Ratings:

r

= .16

New approaches use item-response theory scoring of

forced-choice data (

Stark et al, 2005; Brown & Maydeu-Olivares, 2013)

Forced-choice provides better cross-cultural

comparability vs. single statements

(Bartram, 2013)

…next page…

(8)

Cross-cultural comparability

Country-level

correlations (n =

19) between

UN Human

Development Index

(education, life

expectancy, GDP)

Global competitive

index (WEF),

requirements,

efficiency,

innovation

Agreeableness

Single Statement

.09

.39

Forced Choice

.57

.58

Emotional stability

Single Statement

.07

.50

Forced choice

.27

.53

Extraversion

Single Statement

.41

.20

Forced choice

.76

.46

Conscientiousness

Single Statement

-.46

-.40

Forced Choice

-.08

.21

(9)

Copyright © 2014 Educational Testing Service. All rights reserved.

(10)

Cross-Cultural Validity

Attitude-

achievement “paradox”

Positive average within country correlations

“Better attitudes are associated with higher

achievement”

Negative country-level correlations

“Countries with high average attitude scores are

ones with lower average achievement”

(11)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 -0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80

-0.60 -0.40 -0.20 0.00 0.20 0.40 0.60

C oun tr y -l e v e l Within-country

Alignment of within-country and country-level correlations

PISA 2012 Field Trial

Math self

efficacy

Math self

concept

Failure

attribution

SJTs

Math

interest

Perceived control

Disciplinary climate

Familiarity with math

concepts

Personality

Math

Anxiety

More context,

concrete,

countable

Weak anchors,

abstract,

vague

Average correlation within country (

N

= 1,000+ students)

-.60 -.40 -.20 .00 +.20 +.40 +.60

Cou

ntry

-level

correlati

on

(60 countri

es)

-.80

-.60

-.40

-.20

.00

+.20 +.

40

+.60

Correlations with Mathematics Achievement Scores

(12)

Anchoring Vignettes

PISA 2012: anchoring vignettes (and forced choice)

“solved” this problem

Anchoring vignettes are a method for rescaling Likert

scale responses to respondent’s personal anchors

See: Gary King’s website on anchoring vignettes:

http://gking.harvard.edu/vign/

(King et al., 2004; King

& Wand, 2007)

Growing in popularity

(13)

ST61

01

Below you will find descriptions of three mathematics

teachers. Read each of the descriptions of these teachers.

Then let us know to what extent you agree with the final

statement.

(Please check only one box on each row.)

Strongly

agree

Agree

Disagree Strongly

disagree

a)

Ms. Anderson assigns mathematics

homework every other day. She always gets

the answers back to students before

examinations.

Ms. Anderson is concerned

about her students’ learning.

1 2 3 4

b)

Mr. Crawford assigns mathematics homework

once a week. He always gets the answers

back to students before examinations.

Mr.

Crawford is concerned about his students’

learning.

1 2 3 4

c)

Ms. Dalton assigns mathematics homework

once a week. She never gets the answers

back to students before examinations.

Ms.

Dalton

is concerned about her students’

learning.

1 2 3 4

(14)

ST61

01

Below you will find descriptions of three mathematics

teachers. Read each of the descriptions of these teachers.

Then let us know to what extent you agree with the final

statement.

(Please check only one box on each row.)

Strongly

agree

Agree

Disagree Strongly

disagree

a)

Ms. Anderson assigns mathematics

homework every other day. She always gets

the answers back to students before

examinations.

Ms. Anderson is concerned

about her students’ learning.

1 2 3 4

b)

Mr. Crawford assigns mathematics homework

once a week. He always gets the answers

back to students before examinations.

Mr.

Crawford is concerned about his students’

learning.

1 2 3 4

c)

Ms. Dalton assigns mathematics homework

once a week. She never gets the answers

back to students before examinations.

Ms.

is concerned about her students’

1 2 3 4

(15)

ST61

01 Below you will find descriptions of three mathematics

teachers. Read each of the descriptions of these

teachers. Then let us know to what extent you agree with

the final statement.

(Please check only one box on each row.)

Strongly

agree

Agree

Disagree Strongly

disagree

a)

Ms. Anderson assigns mathematics

homework every other day. She always gets

the answers back to students before

examinations.

Ms. Anderson is concerned

about her students’ learning.

1 2 3 4

b)

Mr. Crawford assigns mathematics

homework once a week. He always gets the

answers back to students before

examinations.

Mr. Crawford is concerned

about his students’ learning.

1 2 3 4

c)

Ms. Dalton assigns mathematics homework

once a week. She never gets the answers

back to students before examinations.

Ms.

Dalton

is concerned about her students’

learning.

1 2 3 4

Student “B’s” responses

(16)

01 Below you will find descriptions of three mathematics

teachers. Read each of the descriptions of these teachers.

Then let us know to what extent you agree with the final

statement.

(Please check only one box on each row.)

Strongly

agree

Agree

Disagree Strongly

disagree

a) Ms. Anderson assigns mathematics homework every other day. She always gets the answers back to students before examinations. Ms. Anderson is

concerned about her students’ learning.

1 2 3 4

b) Mr. Crawford assigns mathematics homework once a week. He always gets the answers back to

students before examinations. Mr. Crawford is

concerned about his students’ learning.

1 2 3 4

c) Ms. Dalton assigns mathematics homework once a

week. She never gets the answers back to students before examinations. Ms. Dalton is concerned

about her students’ learning.

1 2 3 4

02

My teacher lets students know they need

to work hard.

1 2 3 4

For Student “A” this can be interpreted as “like the middle hypothetical

(17)

01 Below you will find descriptions of three mathematics

teachers. Read each of the descriptions of these teachers.

Then let us know to what extent you agree with the final

statement.

(Please check only one box on each row.)

Strongly

agree

Agree

Disagree Strongly

disagree

a) Ms. Anderson assigns mathematics homework every other day. She always gets the answers back to students before examinations. Ms. Anderson is

concerned about her students’ learning.

1 2 3 4

b) Mr. Crawford assigns mathematics homework once a week. He always gets the answers back to

students before examinations. Mr. Crawford is

concerned about his students’ learning.

1 2 3 4

c) Ms. Dalton assigns mathematics homework once a

week. She never gets the answers back to students before examinations. Ms. Dalton is concerned

about her students’ learning.

1 2 3 4

02

My teacher lets students know they need

to work hard.

1 2 3 4

For Student “B” this can be interpreted as “better than the best hypothetical

teacher”

Student “B’s” responses

(18)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 -0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80

-0.60 -0.40 -0.20 0.00 0.20 0.40 0.60

C oun tr y -l e v e l

Alignment of within-country and country-level correlations

PISA 2012 Field Trial

Math self

efficacy

Math self

concept

Failure

attribution

SJTs

Math

interest

Perceived control

Disciplinary climate

Familiarity with math

concepts

Personality

Math

Anxiety

More context,

concrete,

countable

Weak anchors,

abstract,

vague

-.60 -.40 -.20 .00 +.20 +.40 +.60

Cou

ntry

-level

correlati

on

(60 countri

es)

-.80

-.60

-.40

-.20

.00

+.20 +.

40

+.60

(19)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 -0.80 -0.60 -0.40 -0.20 0.00 0.20 0.40 0.60 0.80

-0.60 -0.40 -0.20 0.00 0.20 0.40 0.60

C oun tr y -l e v e l Within-country

Alignment of within-country and country-level correlations

PISA 2012 Field Trial

Teacher-Support

(Likert)

Mathematics Interest

(Likert)

Mathematics Interest

(anchored from

Teacher-Support)

Teacher-Support

(anchored)

Average correlation within country (

N

= 1,000+ students)

-.60 -.40 -.20 .00 +.20 +.40 +.60

Cou

ntry

-level

correlati

on

(60 countri

es)

-.80

-.60

-.40

-.20

.00

+.20 +.

40

+.60

Correlations with Mathematics Achievement Scores

(20)

Findings so far

We have developed anchoring vignettes for many

constructs, from Big 5 to emotional intelligence, for

students and teachers

Anchoring vignettes work very well on poorly anchored

scales (much of personality assessment)

They improve cross-country comparability; they also

increase validity within a country

Anchoring vignettes developed for one scale can be used to

adjust other scales

It is important to write vignettes so that students rate them

appropriately

Even without rescoring, they work by giving a frame of

(21)

Copyright © 2014 Educational Testing Service. All rights reserved.

(22)

Summary

Does this

method

address this

problem ?

Anchoring

vignettes

Forced

choice

Others’

ratings

Situational

Judgment

Tests (SJTs)

Behavior

anchored

rating

scales

(BARS)

Per-

formance

measures

Differentiation

yes

yes

no

yes

sometimes In principle

Cultural

comparability

yes

yes

no

no

sometimes

probably

Social desirability

sometimes

yes

yes

somewhat

no

probably

Reference bias

yes

yes

no

yes

yes

In principle

Response style

bias

yes

yes

no

yes

yes

In principle

Disadvantages

More testing time

x

xx

xx

High

dev’lp

costs

xx

xx

?

(23)

Summary

Anchoring vignettes

can increase validity and address

cross-cultural comparability; they require more time

Forced-choice

additionally prevent socially desirable

responses (“faking good”) but take even more time

Others’ (teachers, parents) ratings

are useful in that

they provide a different frame of reference, and research

shows they are more predictive of outcomes

Other methods (

SJTs, BARS

) are useful; they have high

development costs

Performance measures

are potentially ideal but we do

not have many of these, yet

Using one or more of these techniques can increase

the quality of noncognitive data available for use in

analysis & policy

(24)

References

Bartram, D. (2013). Scalar equivalence of OPQ32: Big Five profiles of 31 countries. Journal of

Cross-Cultural Psychology, 44(1), 61-83.

Brown, A., & Bartram, D. (2009, April). Doing less but getting more: Improving forced-choice

measures with IRT. Paper presented at the 24th annual conference of the Society for

Industrial and Organizational Psychology, New Orleans, LA.

Brown, A. & Maydeu-Olivares, A. (2013). How IRT can solve problems of ipsative data in forced-choice questionnaires. Psychological Methods, 18(1), 36-52. DOI: 10.1037/a0030641

Connelly BS & Ones DS (2010). An other perspective on personality: meta-analytic integration of observers' accuracy and predictive validity. Psychol Bull, 136(6), 1092-1122.

Drasgow, F., Stark, S., Chernyshenko, O.S., Nye, C.D., Hulin, C.L., White, L. A. (2012). Development of the Tailored Adaptive Personality Assessment System (TAPAS) to Support Army Selection

and Classification Decisions. ARI Technical Report 1311. Fort Belvoir, VA: U.S. Army Research

Institute for the Behavioral and Social Sciences

(25)

References

King, G., Christopher JL ,Murray, C.J.L., Salomon, J.A., & Tandon, A. (2004). Enhancing the Validity and Cross-cultural Comparability of Measurement in Survey Research. American Political Science Review 98: 191–207 King, G., & Wand, J. (2007). Comparing Incomparable Survey Responses: New Tools for Anchoring Vignettes.

Political Analysis 15: 46-66.

Kyllonen, P.C. (2013). Soft skills for the workplace. Change, 45(6), 16-23.

Kyllonen, P.C., & Bertling, J. (2014). Innovative questionnaire assessment methods to increase cross-country comparability. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis. Boca Raton, FL: CRC Press.

Lindqvist, E., Vestman, R. (2011). The Labor Market Returns to Cognitive and Noncognitive Ability: Evidence from the Swedish Enlistment. American Economic Journal: Applied Economics, 3(1): 101-28.

Oh, I.-S., Wang, G., Mount, M.K. (2011). Validity of observer ratings of the five-factor model of personality traits: A meta-analysis. Journal of Applied Psychology, 96(4), 762-773.

Salgado, J.F., & Táuriz, G. (2012): The Five-Factor Model, forced-choice personality inventories and

performance: A comprehensive meta-analysis of academic and occupational validity studies, European Journal of Work and Organizational Psychology, DOI:10.1080/ 1359432X.2012.716198

Segal, C. (2013). Misbehavior, Education, and Labor Market Outcomes, The Journal of the European Economic Association, 11(4), 743-779.

Referências

Documentos relacionados

Através da análise podemos observar o crescimento do impacto da I4.0 e as suas características assim como o impacto face às práticas de gestão de recursos humanos inerentes ao

Most specimens sent corresponded to eyelid (54.7%), followed by conjunctiva (26.7%); the most common pathological diagnosis was malignant epithelial lesions (22.48%), followed

Com a realização deste estudo pretendia-se confirmar que o pH urinário, temperatura rectal e os corpos cetónicos sanguíneos são indicadores precoces de susceptibilidade de

O sistema de energia gerido pelo algoritmo, requer o uso de conversores DC-DC flexíveis, capazes de controlar a corrente na saída, o que permite interligação

The hormetic model also providesdecision- m akers in regu latory agen cies with a m u ch broader array of options in the risk assessment process; with the hormetic model, they

Congénita, em que os participantes foram pais de crianças submetidas a cirurgia cardíaca, concluíram que após o regresso a casa, a competência percebida pela maior

De facto, a vivência da parentalidade proporciona momentos únicos e gratificantes, porém confronta igualmente os pais com novas exigências, que podem constituir um risco

Também Estrela (1984,cit in Valério 2012) e Alves (1994, cit in Valério, 2012), agruparam os fatores determinantes da insatisfação profissional na atividade docente em