1 EstudosrealizadoscomapoiodaCAPES,CNPqedoFundoSONda UniversidadedeGroningen,Holanda.Agradecimentosaosestudantes daUniversidadedeBrasíliaedaUniversidadedeGroningen. 2 Endereço: Institute of Psychology Heymans, Grote Kruisstraat 2/1,
Groningen, The Netherlands, 9712 TS. E-mail: P.J.Tellegen@ppsw. rug.nl
CulturalBiasintheSON-RTest:
ComparativeStudyofBrazilianandDutchChildren
1PeterJ.Tellegen2
UniversityofGroningen JacobA.Laros UniversidadedeBrasília
ABSTRACT–Thepresentstudy,including83Brazilianand51Dutchchildren,evaluatedthepresenceofculturalbiasin itemsoftheSON-R5½-17thatmakeuseofconcreteobjectsandsituations.Twoprocedureswerefollowedtodetectitem bias.Thefirstconsistedofaskingthechildren,immediatelyafteranincorrectanswer,whethertheyrecognizedthepictures. ThesecondprocedurecompareditemdifficultiesoftheBrazilianchildrenwiththoseoftheDutchchildrenbelongingtothe standardizationsampleoftheSON-R5½-17.Fourteenitemsweredetectedwithbias:tenofthesefavoredtheDutchgroup andfourtheBraziliangroup.TheculturaldisadvantageforBrazilianchildrenisrathersmall,takingthelargeamountof investigateditemsintoaccount.ThisstudyindicatedwhichitemsoftheSON-R5½-17shouldbeimproved,notonlyfor reasonsofculturalbias,butalsobecausechildren,irrespectiveoftheirculturalbackground,encounteredproblemswiththe recognitionofseveralpictures.
Keywords:culturalbias;itembias;nonverbalintelligencetest.
ViésCulturalnoTesteSON-R:
EstudoComparativoentreCriançasBrasileiraseHolandesas
RESUMO–Nopresenteestudo,incluindo83criançasbrasileirase51holandesas,verificou-seapresençadeviéscultural nositensdoSON-R5½-17queusamobjetosesituaçõesconcretas.Doisprocedimentosforamseguidosparadetectarviésdo item.Noprimeiro,perguntou-seàscrianças,imediatamentedepoisdeumarespostaerrada,seelasreconheceramosdesenhos utilizadosnositens.Nosegundoprocedimento,comparou-seadificuldadedositensparaascriançasbrasileirascomadificuldade dositensparaascriançasholandesasdaamostradenormatizaçãodoSON-R5½-17.Identificaram-sequatorzeitenscom viés,dosquaisdezfavorecemascriançasholandesasequatroascriançasbrasileiras.Adesvantagemculturalparaascrianças brasileirasébastantepequena,levandoemconsideraçãoograndenúmerodeitensinvestigado.Esteestudoindicouquaisitensdo SON-R5½-17precisamsermelhorados,nãosóporrazõesdeviéscultural,mastambémporquecrianças,independentemente dobackgroundcultural,encontraramproblemascomoreconhecimentodeváriosdesenhos.
Palavras-chave:viéscultural;viésdoitem;testenão-verbaldeinteligência.
The test was originally developed in order to be able to assess the learning ability of children who were severely handicappedintheirlanguagedevelopment.Atthattime, existingnonverbalintelligencetestswerenotsuitedforthe examinationofabroadspectrumoflearningabilitiesbecause theyconsistedmainlyofperformancetestsrelatedtospatial abilities(likeformboards,mazes,andmosaics).Thefirst SON-test of 1943 consisted of nonverbal subtests related to abstract and concrete reasoning, and contained norms fordeafchildrenfrom4to14yearsofage.Atpresent,the fourthgenerationofSONnonverbalintelligencetestsexist oftwoversions:oneforyoungerchildren,theSON-R2½-7, andoneforolderchildren,theSON-R5½-17.Thepresent articlediscribesastudywhichwasrealizedwiththeSON-testforolderchildren.
The SON-R 5½-17 consists of the following subtests: Categories, Analogies, Situations, Stories, Patterns, and HiddenPictures.Thefirstthreearemultiplechoicetests,the remainingfourareactiontests.Inactionteststhesolutionhas tobesoughtinanactivemanner,whichmakesobservationof behaviorpossible.TheSON-R5½-17canbedividedintofour InBrazilandotherSouthAmericancountriesthereisa
greatneedforstandardizedandvalidatedpsychologicaltests, especiallyfortestsinrelationtointelligenceforchildrenand youth(Hu&Oakland,1991;Oakland,Wechsler,Bensuan& Stafford,1994;Muñiz,Prieto,Almeida&Bartram,1999). Anonverbalintelligencetestthatmightfillthisneedisthe SON-R5½-17,theSnijders-Oomennonverbalintelligence testforchildrenandadolescentsintheagesof5½to17years (Snijders,Tellegen&Laros,1989).
typesoftestsaccordingtotheircontents:abstractreasoning tests(CategoriesandAnalogies),concretereasoningtests (SituationsandStories),spatialtests(MosaicsandPatterns), and perceptual tests (Hidden Pictures). The subtest items are presented using an adaptive test procedure. Adaptive proceduresareaimedatlimitingthenumberofitemstobe administeredwithrelativelylittlelossofreliability(Weiss, 1982). The adaptive procedure of the SON-R 5½-17 is basedonthedivisionofitssubtestsintwoorthreeparallel seriesofabout10items.Thefirstseriesofasubtestserves toestimatethesubject’sgenerallevelofperformance.Of thefollowingseriesonlythoseitemsthatcanimproveand refine this first estimation are administered. The adaptive testprocedurereducestheamountofitemstobepresented by about 50%. The application of the test takes about 90 minutes;theshortenedversion,consistingoffoursubtests, takesapproximately45minutes.
Thestandardizationwasperformedusingarepresentative sampleofchildrenoftheNetherlands,consistingof1,350 subjects from 6 to 14 years of age. Each age group was representedbyasampleof150subjectswhichwasstratified according to sex, educational type, and demographic variables.Theexpansionofthenormstotheagesof5½to17 wasachievedthroughextrapolation.Normtablesfor38age groupsmakeitpossibletodrawcomparisonsatthesubtest level.ThetotaltestresultisrepresentedasanIQscore(with probabilityinterval),asapercentilescore,andasareference age.Inadditiontothe1,350hearingsubjects,also768deaf childrenwereexaminedwiththetest.Acomputerprogram forthecalculationofstandardizedscoresissuppliedwiththe test.Afterthebirthdate,thetestdate,andtherawsubtest scoreshavebeenentered,thecomputerprogramcalculates thestandardizedscoresautomaticallybasedonthesubject’s exact age. The SON-R 5½-17 has been reviewed by the COTAN,thetestcommissionoftheNetherlandsInstitute ofPsychologists,andreceivedthehighestpossibleratings on all seven categories of evaluation. The categories of reviewing were as follows: (1) basics of the construction ofthetest;(2)qualityofthetestmaterials;(3)qualityof themanual;(4)norms;(5)reliability;(6)constructvalidity, and(7)criterionvalidity.TheSON-R5½-17isbeingused in various countries; the manual is available in English, German,andDutch.
Advocatesofculturefairintelligencetestshavecriticized traditionaltestsforgeneralintelligence,liketheWechsler intelligence tests and the Standford-Binet tests, because theyoftenmakeanappealtospecificlanguageskills,and insodoingplacemembersofculturalminoritygroupsat adisadvantage.Advocatesoflearningpotentialtestshave critized traditional tests for general intelligence because thesetestswouldmeasuretheendresultofpriorlearning, rather than learning potential (Tellegen & Laros, 1993). By focussing on the end result of prior learning, these tests would underestimate the learning ability of persons from lower socio-economic background, and of members ofethnicminorities.Onecouldstatethatthesetestsfocus more on “crystallized intelligence” rather than on “fluid intelligence”(Cattell,1971).
TheSON-R5½-17differsinthreeessentialaspectsfrom traditionalintelligencetests:inthefirstplace,itdoesnot
requirespecificlanguageabilities;inthesecondplace,ituses anadaptivetestprocedure;andinthethirdplace,itoffers feedbackaftereachitemwhichinformsthesubjectwhether theansweriscorrect.Amajoradvantageofgivingfeedback isthatthesubjectisgivingtheopportunitytolearnduringthe testadministration.Withtheseessentialaspects,theSON-R 5½-17showsmoreresemblancestoculturefaireintelligence testsandtestsforlearningpotentialthantotraditionaltests forgeneralintelligence.
In order to contribute to the demand in Brazil for intelligence tests of good quality, the SON-R 5½-17 has tobestandardizedandvalidatedforthatcountry.Priorto standardization,however,itisnecessarytoverifywhetherthe materialsusedinthetestarefamiliartoBrazilianchildrenand adolescents.Toobtainsuchevidencethepresentstudywas undertaken.Thus,thegoalofthepresentstudyistodiscern if,andtowhatextent,adaptationsofthetestmaterialsofthe SON-R5½-17arerequiredinordertoassesstheconstruct of(nonverbal)intelligenceinBrazilwiththistestinafair way.Thisgoalisinaccordancewiththeguidelinesontest use of the International Test Commission (Van de Vijver &Hambleton,1996).Oneoftheguidelinesstatesthattest developers/publishers should provide evidence that item contentandstimulusmaterialsarefamiliartoallintended populations.
ThefactthattheitemsandtheexamplesoftheSON-Rdo notneedtobetranslatedmakesthetestpotentiallysuitable forinternationalandcross-culturalresearch.Theadaptation process of nonverbal tests for multiple cultures does not include the difficult and often extremely problematic test translation phase and is therefore much less complicated thanfor(partly)verbaltests.
The research finding that immigrant children in the Netherlands (mainly children from Morocco, Turkey, Suriname and the Dutch Antilles) perform better on the SON-tests than on traditional intelligence tests like the WISC-R (Laros & Tellegen, 1991; Tellegen, Winkel, Wijnberg-Williams&Laros,1998)isanindicationofthe culture-fairnessoftheSON-Rforimmigrantgroupsinthe Dutchsociety.
Asecondresearchresultwithpositiveimplicationsfor theculture-fairnessoftheSON-testsforimmigrantsisthe findingthatthereisnorelationbetweenlengthofstayin theNetherlandsandtheirIQ-scores,suggestingthatinthe SONtest,intelligenceisnotdependentonknowledgeofthe Dutchlanguage(Snijdersetal.,1989).Asathirdresearch resultweshouldmentionthattheperformanceofimmigrant childrenissimilarontheSON-Rsubtestswithmeaningful picturescomparedtothesubteststhatusematerialsofan abstractnature.
Theaforementionedpositiveindicationsoftheculture-fairness of the SON-tests are based on research results with immigrant groups in the Netherlands and offer no guarantee for the culture-fairness of the test in a South-American country. The present study was undertaken to obtain indications of the degree of culture-fairness of the SON-RinBrazil.
Method
Participants
TheBraziliansampleincluded83children(41male,42 female)ranginginagefrom7to14years(M=10.5,SD= 2.1).Thechildrenwererecruitedfromtwostateschoolsin Brasilia.Withintheschoolschildrenwereselectedonbasis oftheirage;thechildrenwhowereselectedhadtheirbirthday ascloseaspossiblehalfayearfromthetestdate.TheDutch sampleconsistedof51children(24male,27female)ranging inagefrom7to12years(M=9.9,SD=1.3).Theparticipants wererecruitedfromthreeschoolsinthenorthernpartofthe Netherlands.Thesameselectioncriteriawereusedasfor theBraziliansample.
Instruments
TheSON-R5½-17istherevisedversionoftheSnijders-Oomen Nonverbal intelligence test for children and adolescentsof5½to17years(Snijdersetal.,1989;Tellegen &Laros,1993).Thetestconsistsofsevensubtests,which are,inorderofadministration:Categories,Mosaics,Hidden Pictures,Patterns,Situations,Analogies,andStories.The standardization of the SON-R 5½-17 in the Netherlands is based on a nationwide sample of 1,350 children and adolescentsvaryinginagefrom6to14years.Thereliability coefficient(alphastratified)oftheIQincreasesfrom.90at sixyearsto.94atfourteenyearswithameanvalueof.93. Theaveragereliabilityofthesubtestsis.76.Thevalidity oftheSON-R5½-17isevidentfromtheclearrelationship with different indicators of school career such as school type,classrepetitionandschoolreportmarks.Themultiple correlationoftheSON-RIQwiththeseindicatorsofschool careeris.59.
The subtests of the SON-R 5½-17 can be divided in twotypesoftestsaccordingtothematerialthatisbeing used:teststhatusemeaningfulpicturematerial(Categories, Situations,andStories)andteststhatusenon-meaningful materialssuchasgeometricalforms(Mosaics,Patterns,and Analogies).HiddenPicturesisacaseonitsownbecause thetaskinthissubtest,recognition,isindependentofthe
typeofmaterialused.Inthepresentstudyonlysubtests thatusemeaningfulpicturematerialwereincluded,because culturalbiasismorelikelytooccurwiththiskindofsubtests thanwiththosethatusenon-meaningfulmaterialssuchas geometricalforms(Jensen,1980).InthesubtestCategories, a child has to choose two pictures that are missing in a certain category out of five possible pictures. The task inSituationsistoindicatethemissingpartsofdrawings ofconcretesituations.InStoriesthechildhastoordera number of cards in such a way that they form a logical story.CategoriesandSituationsaremultiplechoicetests, whileStoriesisaso-called“action”test,wherethechild hastoconstructthesolutionratherthantochoosetheright alternative. The subtest Categories consists of 27 items, Situationsof33items,andStoriesof20items.
Procedure
The first step in this study was the translation of the instructionsintothePortugueselanguage.Afterobtaining parental permission, the SON-R subtests Categories, Situations and Stories were administered to the Brazilian children. Six graduate psychology students administered thesubtestsafterbeingtrainedintheadministrationofthese subtests. Supervision was provided by one of the authors oftheSON-R5½-17.Theindividualadministrationofthe subtests,whichoccurredattheschoolofthepupils,required approximatelyonehour.
TheadaptiveprocedureoftheSON-Rwasnotusedin this study. Instead, the items were administered in order ofincreasingdifficulty.Theadministrationofthesubtests Categories and Situations was stopped after 12 errors; withthesubtestStoriesastoppingruleofeighterrorswas maintained.Aftereachitemthechildwasinformedwhether the answer was right or wrong. Providing feedback is an important part of the standard administration procedure of the SON-R, because it clarifies the instructions and givestheexamineetheopportunitytolearnfromhiserrors and successes and to adjust his problem solving strategy. Immediatelyafteranincorrectanswertoanitem,thechildren were asked whether they recognized and could name the picturesusedintheitem.
In Categories each item contains eight pictures: three examplepicturesthatdefinethecategoryandfivealternatives fromwhichtochoosethetwocorrectpictures.Inthecase of Situations, the subjects were asked if they recognized andcoulddescribethemaindrawingandthemissingparts. WithStoriesthechildrenwereaskedtodescribethepictures thathadtobeordered.Inadditiontotheadministrationof thethreesubtests,otherdataoftheBrazilianchildrenwere gatheredtoobtaininformationaboutthevalidityofthetest. Forthe83participantsoftheBraziliansample,schoolmarks onmathematics,science,andPortuguesewerecollected.The schoolteachersoftheBrazilianchildrenwererequestedto evaluate them on their degree of motivation, cooperation, and concentration during class hours. The evaluation was givenona3-pointscale,rangingfrom“low”,via“average” to“high”.
TheauthorsoftheSON-R5½-17providedsupervision.The studyintheNetherlandswasrealizedtoverifyifproblems oftheBrazilianchildrenwiththerecognitionofdetermined pictures of Categories and Situations were really due to cultural bias or were caused by other factors. Examples of other causes of recognition problems are unclearly drawnpicturesorpictureswhichrepresentobjectsthatare infrequentlyused.Itembiaswasassumedtobepresentif onegroupshowedmoreproblemswiththerecognitionof adeterminedpicturethantheothergroup.Ifbothgroups indicatedconsiderableproblems,thiswasanindicationthat thepictureassuchwasdifficulttorecognize.
The subtest Stories was not administered in the Netherlands because the Brazilian children did not show anyproblemswiththerecognitionofthepicturesusedin this subtest. In the Netherlands the same administration procedure as in Brazil was followed for the subtests Categories and Situations. The individual administration ofthesubtests,whichoccurredattheschoolofthepupils, requiredapproximatelythree-quartersofanhour.
Dataanalysis
Inallanalysesage-correctedstandardscoreswereused (M=100,SD=15). The standardized subtest scores were obtainedusingthecomputerprogramthatisincludedwith theSON-R5½-17.Insomeanalysesthedifferencebetween groupswasdescribedusingd-ratios.Thed-ratioexpresses thedifferencebetweenthemeansinunitsofthestandard deviationofthesamples.Coefficientlambda2ofGuttman (λ2)waschosenfortheestimationofreliabilitybecauseit does not underestimate reliability as much as coefficient alpha, especially in the case of short tests (Ten Berge & Zegers, 1978). Since the reliability coefficientsλ2were calculatedonbaseofsamplesthatwereheterogeneousin relation to age, a correction for the influence of age was applied.Asecondcorrectionofthereliabilitycoefficients wasappliedinrelationtothevarianceofthestandardized scores(Guilford&Fruchter,1978).Totestthesignificance ofthedifferenceinpercentageofunknownpicturesbetween theBrazilianandDutchsample,theFisherexactprobability testwasused(Siegel&Castellan,1988).
In the analysis of Differential Item Functioning (DIF) the procedure of Bilog-MG was employed (Zimowski, Muraki,Mislevy&Bock,1996).Thisprocedureassumes thatdifferentialitemfunctioningonlyextendstothedifficulty oftheitemsandnottothediscriminatingpower.Inother words, the assumption is made that the slope parameters (a-parameters)oftheitemsarehomogeneousacrossgroups. Theitemdifficultiesareallowedtodifferfromonegroupto another.Forthegroupsthatarebeingcompareddifferent latentdistributionsareassumed.Bilog-MGestimatestheDIF effectsoftheitemsascontrastsbetweenthereferencegroup andtheso-calledfocalgroup(s).Inouranalysisthereference groupwasthestandardizationsampleoftheSON-R5½-17of 1,350subjectsfromtheNetherlands,andthefocalgroupwas thesampleof83Braziliansubjects.Anitemwasclassified asaDIFitemwhenthedifferenceoftheb-parametersin thereferenceandfocalgroupwasstatisticallysignificantat the5%level.
Results
Overallperformance
Means,standarddeviations,reliabilitycoefficients(λ2) andd-ratiosofthedifferencesinmeanscoresarepresentedin Table1.TheBrazilianchildrenobtainedalowermeanscore onthesubtestsCategoriesandSituationsincomparisonwith theDutchchildren.Thesedifferencesaresignificantatthe 5%level.AccordingtoCohen”sclassification(Cohen,1992), thed-ratioofthedifferencebetweenthemeanscoreforthe twogroupsforthesubtestCategoriesindicatesasmalleffect size,whilethed-ratioforthesubtestSituationssuggestsa mediumeffectsize.
Table1.Means,standarddeviationsandreliabilities(λ2)onthreeSON-R
subtestsfortheBrazilianandDutchgroupandd-Ratiosbetweenthetwo groups
Braziliangroup (N=83)
Dutchgroup (N=51)
Mean (SD) λ2 Mean (SD) λ2 d-Ratio
Categories 94.8 (15.9) .74 100.4 (16.5) .75 -.35 Situations 95.0 (20.3) .67 109.2 (15.2) .71 -.77
Stories 97.5 (16.9) .69 - -
-Notes-Thereliabilitycoefficientsλ2werecorrectedforageandforthe
standarddeviationsofthetwogroups.
-Thed-ratioexpressesthedifferencebetweenthemeansoftheBrazilian andDutchgroupinunitsofthestandarddeviation(SD).
The higher d-ratio for Situations is a consequence of the relative high performance of the Dutch children on this subtest. Within the Brazilian group the differences betweenthethreesubtestsarenotstatisticallysignificant. Thereliabilitycoefficientsλ2of.74and.67forCategories and Situations for the Brazilian children are quite similar tothevaluesof.75and.71inthestandardizationsample of the Netherlands. The correlations between the subtests are relatively high in the Brazilian sample (Table 2). The correlationsinvolvingthesubtestSituationsaresignificantly higheratthe5%significancelevelfortheBrazilianchildren thanfortheDutchchildren.Thehighercorrelationsofthe subtestSituationsintheBraziliansamplemightberelated tothehighstandarddeviationofthissubtest(SD=20.3) compared to the standard deviation of this subtest in the standardizationsampleoftheNetherlands(SD=15.0).
Table2.Correlations(correctedforunreliability)betweenthethreeSON-R subtestsfortheBraziliangroupandtheDutchstandardizationsample
Brazilian
group Dutchstandardizationsample
N=83 N=1,350
Categories-Situations .77 .59
Categories-Stories .58 .51
Situations-Stories .84 .74
Recognitionofpictures
Thefirstproceduretoidentifythepresenceofitembias wasbasedontherecognitionofthepicturesbythechildren whogaveawronganswertoanitem.Thebasicideabehind thisprocedureisthatchildrenshouldnotfailanitembecause theyareunfamiliarwithoneormorepicturesusedinthatitem. Table3displaysthetwelveitemsofthesubtestCategories containingpicturesunknowntoatleast20%oftheBrazilianor Dutchchildrenwhocouldnotsolvetheitem.Eachitemofthe subtestCategoriesiscomposedofeightdifferentpictures,three todefinethecategoryandfivepicturesfromwhichtwoshould bechosenthatbelongtothecategory.Thesecondcolumnof thetabledescribesthepicturesusedintheseitems.Thethird columndisplaysthenumberofBrazilianchildrenwhogave an incorrect answer to the item. The fourth column shows the percentage of these children that did not recognize the picture.Thefifthandsixthcolumnshowthesameinformation fortheDutchgroup.Thelastcolumnshowsthedifferencein
percentageforthetwogroupsandwhetherthisdifferenceis statisticallysignificantatthe5%level.
AccordingtotheresultsofTable3,picturesusedinitems 2b,2c,4b,6a,and8awereunknowntoahigherpercentage oftheBraziliangroupcomparedtotheDutchgroup.This isanindicationthatthesefiveitemsarebiasedinfavorof theDutchgroup.
A higher percentage of the Dutch group encountered problemsintherecognitionofoneofthepicturesusedin items4aand9a:thesetwoitemsseemtobebiasedinfavor oftheBraziliangroup.Bothgroupsshowedthesamedegree ofproblemsrecognizingpicturesinitems1c,3a,3b,5c,and 9c.Thesefiveitemsdonotseemtobeculturallybiasedsince thepicturesweredifficulttorecognizeforbothgroups.
Table 4 presents the same type of information for the subtestSituations.Inspectionofthistablerevealsthatthere arenineitemsofSituationswithapictureunknowntoatleast 20%ofthechildrenwhorespondedincorrectlytotheitem. Thelastcolumnofthetableshowsthatitem10aistheonly
Table3.ItemsofthesubtestCategorieswhichcontainpicturesunknowntoatleast20%oftheBrazilianorDutchchildrenwhofailedtheitem
Categories
Item Picture Braziliangroup Dutchgroup
N-wrong %unknown N-wrong %unknown Differencein%
1c A4-(factory) 13 23.1 5 20.0 3.1
2b A4-(electricoutlet) 20 50.0 8 0.0 50.0*
2c A3-(birdnest) 26 34.6 16 0.0 34.6*
3a E1-(stopwatch) 30 26.7 13 38.5 -11.8
3a E3-(thermometer) 30 20.0 13 30.8 -10.8
3b A1-(dishrack) 29 6.9 18 33.3 -26.4
4a E1-(boltoftextile) 56 12.5 41 65.9 -53.4*
4b A2-(washcloth) 38 94.7 12 0.0 94.7*
5c A4-(washtub) 40 7.5 16 31.3 -23.8
6a A1-(sledge) 58 62.1 37 0.0 62.1*
8a E3-(handlebars) 42 54.8 28 0.0 54.8*
9a A4-(mosque) 47 8.3 12 33.3 -25.0*
9c A2-(diagram) 18 50.0 25 44.0 6.0
Notes-E1,E2,andE3aretheexamplesthatdefinethecategory;A1toA5arethealternativestochoosefrom. -PositivedifferencesindicateitemsrelativelyunknowntotheBraziliangroup.
-Differencessignificantatthe5%levelaremarkedwithanasterisk.
Table4.ItemsofthesubtestSituationswhichcontainpicturesunknowntoatleast20%oftheBrazilianorDutchchildrenwhofailedtheitem
Situations
Item Picture Braziliangroup Dutchgroup
N-wrong %unknown N-wrong %unknown Differencein%
1c A1-(chimney) 14 14.3 4 25.0 -10.7
2b D-(manwithstick) 17 29.4 3 66.7 -37.3
3c D-(bath) 29 31.0 2 38.5 -7.5
4a A2-(pincers) 26 23.1 7 0.0 23.1
4b A4-(vegetables) 22 28.2 7 14.3 13.9
9a D-(angrymother) 29 6.9 20 20.0 -13.1
9b D-(childsortsblocks) 40 5.0 23 21.7 -16.7
10a D-(bikingcontest) 43 0.0 28 21.4 -21.4*
10c D-(constructionsite) 26 3.8 25 24.0 -20.2
Notes-Disthemaindrawingwithonetofourpiecesmissing;A1toA4arealternativestochoosefrom.
itemforwhichthedifferencewasstatisticallysignificantat the5%level.Inotherwords,onlyoneitemofthesubtest Situationsisbiasedaccordingtothisprocedure.Thisitem showsbiasinfavoroftheBraziliangroup.Theremaining eight items 1c, 2b, 3c, 4a, 4b, 9a, 9b, and 10c contained picturesthatweredifficulttorecognizeforbothgroups.For thesubtestStoriesnoresultsaredisplayedastheBrazilian childrendidnotreportanyproblemswiththerecognitionof picturesinthissubtest.
Resuming,theresultsofthisprocedureindicatethatof atotalof80itemsthatwereinvestigated,eightitemsseem tobeculturallybiased:sevenitemsofthesubtestCategories andoneitemofthesubtestSituations.Oftheseeightitems, fiveitemsfavoredtheDutchgroupandthreeitemsfavored theBraziliangroup.Thirteenitemscontainedpicturesthat weredifficulttorecognizeforbothgroups.
Itemdifficulty
Thesecondproceduretoassessthepresenceofitembias was based on the difficulties of the items (b-parameters) accordingtoItemResponseTheory(IRT).Forthisanalysis theprocedureofDifferentialItemFunctioning(DIF)ofthe softwareprogramBilog-MG(Zimowskietal.,1996)was used. The reference group in this analysis was the Dutch standardization sample of the SON-R 5½-17 of 1,350 children,whilethe83Brazilianchildrenformedthefocal group.
Thefirststepinthisprocedurewastoevaluateforeachof thethreeSON-RsubtestswhichIRT-modelfittedbestthedata ofthereferencegroupandthefocalgroupcombined.Forthe subtestsCategoriesandSituationsthethree-parametermodel withafixedc-parameter(“guess”parameter)showedthebest modelfit,whileforthesubtestStoriesthetwo-parameter modelshowedthebestfit.Thenextstepwastotestwhether aDIFmodeloranon-DIFmodelfittedthedatabest.Ina DIFmodelthetwogroupsareconsideredastwoindependent groups with differentb-parameters, while in the non-DIF modelsthetwogroupsaretreatedasonegroup.
Table5.ModelfitofdifferentIRT-modelsforthethreeSON-Rsubtests
Non-DIF
model DIF-model
Difference D.F. C.R.
-2log
likelihood likelihood-2log
Categories 23,546 23,458 88 26 3.38*
Situations 26,760 26,600 160 32 5.00*
Stories 17,196 17,141 55 19 2.89*
Notes-Thecriticalratio(C.R.)istheratioofthedifferenceofthe-2log likelihoodofthetwomodelsandthedegreesoffreedom(D.F.).
-Whenthecriticalratioisgreaterthan1,96itisstatisticallysignificantat the5%level.
Table 5 displays the -2 log likelihood of the non-DIF modelandoftheDIFmodel.ThevaluesoftheDIFmodel arelower,whichindicatesabettermodelfit.Thestatistical test of the model fit is based on the difference of the log likelihoods(Camilli&Shepard,1994).Thedifferenceofthe loglikelihoodsanditsdegreesoffreedomaredisplayedinthe lastcolumnsofTable5.Theratioofthisdifferenceandthe
Table6.ItemsofthethreeSON-RsubtestswhichshowDifferentialItem Functioning(DIF)basedontheitemdifficultiesaccordingtoItemResponse Theory
Categories
Item Descriptionitem Differenceinb-parameter Standarderror
2c 4c 6a 6b 9c
animals toys
meansoftransport fasteners signs
-0.50* -0.42* -0.67**
0.43* 0.81**
0.23 0.20 0.17 0.17 0.30
Situations
Item Descriptionitem Differenceinb-parameter Standarderror
1b 2b 2c 3a 3c 4b 5c 7a 7b 9c 10c
huntingarabbit playingwithadog postingaletter rulingtraffic takingabath sellingflowers breakingdishes playingfootball watchingthemirror joggingalongthebeach workinginconstruction
-2.03** -0.97* -1.46**
0.83* -2.02** -0.59** 0.45** 1.46** 0.54** 0.65** 0.97**
0.46 0.47 0.35 0.35 0.44 0.18 0.17 0.25 0.20 0.18 0.32
Stories
Item Descriptionitem Differenceinb-parameter Standarderror
6a 9a 9b
gettingwateratthewell relaxingatthebeach rowingwithaboat
0.42** 0.43* 0.42*
0.16 0.19 0.17
Notes-Differencesinb-parameterssignificantatthe5%levelaremarked withoneasterisk;differencessignificantatthe1%levelaremarkedwith twoasterisks.
-Positivedifferencesinb-parametersrefertoitemsthataremoredifficultfor theDutchchildren,whilenegativedifferencesindicateitemsmoredifficult fortheBraziliangroup.
degreesoffreedomiscalledthecriticalratio(C.R.).When thisratioisgreaterthan1,96,itisstatisticallysignificantat the5%level.Table5showsthatforallthreeSON-Rtests the DIF-model fits the data significantly better. Thus the two groups were considered to be independent, and the b-parameterswereestimatedseparatelyforeachgroup.
Correlationsbetweenindicesofitemdifficulty
Despite the significant differences inb-parameters for specific items, there is a strong overall correspondence betweenitemdifficultiesintheBraziliangroupandtheDutch standardizationsample.Table7showsthatthecorrelation between thep-values of the two groups varies from .90 (Situations) to .98 (Categories). The correlation between theb-parametersis.87forSituationsand.96forStoriesand Categories.Thatthecorrelationsbetweenthep-valuesand betweentheb-parametersofthetwogroupsgivesuchsimilar resultsisnotsurprising.IntheDutchstandardizationsample thecorrelationbetweenthep-valueandtheb-parameteris closeto-.97forallthreesubtests.Asanexample,Figure1 showsinavisualwaythestrongcorrespondencebetween theb-parametersofthesubtestCategoriesofthefocaland thereferencegroup.Italsoshowsthatitem4aistoodifficult inbothgroupsinrelationtotheorderofadministrationand thatfortheBraziliangroupitem9ciseasierthanitems7c and8c.
Table7.Correlations between different indices of item difficulty of the threeSON-RsubtestsintheBraziliangroup(N=83)andintheDutch standardizationsample(N=1,350)
p-value Netherlands/
p-valueBrazil
b-parameter Netherlands/
b-parameter Brazil
p-value Netherlands/
b-parameter Netherlands
r r r
Categories Situations Stories
.98 .90 .95
.96 .87 .96
-.96 -.97 -.97
Note-Allcorrelationsaresignificantlydifferentfromzeroatthe1%level ofsignificance.
Validity
TheBrazilianpupilswereevaluatedbytheirschoolteachers with respect to the degree of motivation, cooperation and concentration they display during school hours. The
a-series
Netherlands 1 2 3 5 6 478 9
Brazil 1 2 3 5 6 4879
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0 0.5 1.0 1.5 2.0 2.5 3.0
b-series
Netherlands 1 2 34 56 7 8 9
Brazil 1 2 34 6 5 78 9
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0 0.5 1.0 1.5 2.0 2.5 3.0
c-series
Netherlands 1 2 43 65 7 8 9
Brazil 1 2 3456 9 7 8
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0 0.5 1.0 1.5 2.0 2.5 3.0
Figure1.Plotoftheb-parameters(itemdifficulties)oftheitemsofsubtestCategoriesfortheDutchstandardizationsampleandtheBrazilianfocalgroup.
evaluation of the teachers was given on a 3-point scale, rangingfrom“low”,via“average”to“high”.Therationale behindthisevaluationisthatinthestandardizationresearch oftheSON-R5½-17amoderatecorrelationof.33wasfound toexistbetweenthelevelofmotivation,concentrationand cooperationofthepupilsatschoolandtheirperformance ontheSON-Rintelligencetest.Anaveragecorrelationof .66wasfoundbetweentheteacher’sjudgementandmean report marks. Apparently, factors such as concentration, motivation and cooperation are much more important for schoolachievementthanfortheperformanceontheSON-R intelligencetest.Basedonthesefindings,itwasexpectedthat alsofortheBrazilianpupilsonlymoderatecorrelationswould befoundbetweentheteacher’sjudgementonthedegreeof theirmotivation,concentration,andcooperationandtheir performanceonthesubtestsoftheSON-R5½-17.
Table8.CorrelationsofthethreeSON-Rsubtestswithteacher’sjudgement of the motivation, concentration and cooperation of the Brazilian participants
Motivation Concentration Cooperation
Categories Situations Stories
.39 .42 .32
.31 .31 .30
.38 .42 .25
Note-Allcorrelationsaresignificantlydifferentfromzeroatthe1%level ofsignificance.
Table8showsthatthescoresoftheBrazilianchildren on the three subtests are, as expected, only moderately relatedtoteacher’sjudgementoftheirdegreeofmotivation, concentration,andcooperation.Situationsshowedthehighest correlationswiththesecharacteristicsandStoriesthelowest. The moderate correlations for the Brazilian children are quitesimilartothecorrelationsfoundinthestandardization researchoftheSON-R5½-17intheNetherlands(Snijders etal.,1989).
-correctedforunreliability-ofthesubtestswiththeseschool marks.IntheBraziliangroup,thecorrelationswithschool marks on language are slightly higher than in the Dutch group,althoughnoneofthedifferencesissignificantatthe 5%significancelevel.
Thecorrelationof.60fortheBraziliangroupbetween thescoresonthesubtestCategoriesandtheschoolmarks onmathematicsissignificantlyhigherthanthecorrelation of.30fortheDutchgroup.Alsofortheothertwosubtests thecorrelationswithmathematicsarehigherintheBrazilian group,althoughthesedifferenceswiththeDutchgroupare notsignificantatthe5%levelofsignificance.
Discussion
The results of the first procedure used in this study indicatethat21ofthe80itemsofthesubtestsCategories, SituationsandStoriescontainpicturesthataredifficultto recognizefortheBrazilianchildren,theDutchchildrenor for both groups. Thirteen of these problematic items are probablynotculturallybiasedasbothBrazilianandDutch children reported problems recognizing these pictures. Possibleexplanationsfortheobserveddifficultieswith6of these13itemsare:(a)useofoldfashioneddesignsofthe reproducedobjects(stopwatch,thermometer,dishrack);(b) inclusionofpicturesrepresentingoldfashionedobjectsthat arenolongerinuse(washtub);or(c)inclusionofpictures thataresimplyhardtorecognize(factory,diagram).Forthe othersevenproblematicitemsthatweredifficulttorecognize forbothgroupsnogoodexplanationscouldbefound.
There are clear indications that 8 of the 21 items are culturallybiased.Fiveoftheseitemsarebiasedinfavorof theDutchgroupandthreeinfavoroftheBrazilianchildren. Variousexplanationscanbegivenwhyarelativelygreatpart oftheBrazilianchildrendidnotrecognizecertainpictures. Obviously,somepictureswerenotrecognizedbecausethe reproduced objects are uncommon in Brazil (washcloth, sledge), other pictures were not recognized because the designoftheobjectisquitedifferentinBrazilcomparedto theNetherlands(electricoutlet,handlebarsofabicycle).A possibleexplanationwhymoreDutchthanBrazilianchildren haddifficultiesrecognizingthetextileboltmightbethatthe shopsintheNetherlandsaremoremodernthantheonesin
Brazilandexposelessfrequentlyproductsliketextilebolts. Fortheothertwoitemswithpicturesthatweredifficultto recognizefortheDutchgroupnosatisfactoryexplanation couldbegiven.
WiththeprocedurebasedontheIRTitemdifficulties,19 itemswithDIFwereidentified:fiveitemsofCategories,11of Situations,andthreeofStories.Itisimportanttoremarkhere thatDIFindicesassuchdonotprovideimmediateevidence ofitembias.Contentanalysisoftheitemsisrequiredtojudge theimplicationsofDIFforculturalitembias.Especiallyin smallsamples,DIFstatisticscanproduceincalculableTypeI andTypeIIerrorrates(Camilli&Shepard,1994).Therefore, aftertheDIFanalyseswetriedtofindexplanationsforthe differentialfunctioningofitemsthatcouldbeassociatedwith groupmembership.
OfthefiveitemswithDIFofthesubtestCategoriesonly foroneitem(item6a)aconvincingexplanationcouldbe given.Thisitemshowedthehighestvalueofbiasinfavor of the Dutch children. The bias is most likely due to the inclusionofasledgeasoneofthecorrectalternatives,an objectthatisseldomorneverusedinBrazilasaconsequence ofitsclimate.Thisitemwasalsodetectedasbiasedwiththe firstprocedure.Nogoodreasonscouldbefoundtoexplain whyDIFoccurredwiththeremainingfouritems.
OftheitemsofSituationsthatshowedDIFinfavorof theDutchchildren,items1b,2cand3cdisplayedrelative largedifferencesinitemdifficulties.Incaseofitems1band 3cthebiasmightbeexplainedbythefactthatthedisplayed activities,huntingarabbitandtakingabathinabathtubare noregularactivitiesinBrazil.Incaseofitem2c(postinga letter),theexplanationliesinadifferentdesignofpostboxes used in Brazil. For items 2b (playing with a dog) and 4b (sellingflowers),thebiasmightbeexplainedbythefactthat the displayed activities are no regular activities in Brazil. EspeciallythepoorerBraziliansdonotusuallykeepdogs aspets,andflowersareseldomsoldoutintheopen.Item7a (playingfootball)showedalargedifferenceinitemdifficulty. ThebiasinfavoroftheBrazilianchildrenmightbeexplained bythecentralrolethatfootballplaysinBraziliandailylife. FortheotheritemswithbiasinfavoroftheBrazilianchildren noconvincingexplanationfortheoccurrenceofDIFbias couldbefound.ForthethreeitemswithDIFofthesubtest Storiesnoconvincingexplanationscouldbefound.
Resuming, with the first procedure eight items were identified as biased, and with the second procedure seven items. Both procedures indicate item 6a as biased. Interestingly,thefirstprocedureidentifiedmainlyitemsof Categoriesasbiased,whilethesecondprocedureclassified mainlyitemsofSituationsasbiased.Altogether,14items wereidentifiedasbiased.Ofthese,tenfavoredtheDutch childrenandfourfavoredtheBrazilianchildren.Takinginto accountthatthetotalnumberofitemsinvestigatedis80,the negativeeffectofculturalbiasfortheBrazilianchildrenis rathersmall.
Onewaytoestablishtheeffectofitembiasistoanalyze thecorrespondenceinorderofitemdifficultiesbetweenthe BrazilianandDutchchildren.Theorderofitemdifficulties is especially important for the SON-R 5½-17 since the subtests are administered in an adaptive way. For the effectiveness of the adaptive procedure, the order of item
Table9.Correlations–correctedforunreliability–ofthetestscoreson thethreeSON-Rsubtestswithschoolmarksonlanguageandmathematics forapartoftheBraziliangroup(N=33)andforapartoftheDutchnorm sample(N=490)
Language Mathematics
Brazilian group N=33
Dutch group N=490
Brazilian group N=33
Dutch group N=490
Categories Situations Stories
.44 .32 .33
.26 .30 .22
.60 .41 .41
.30 .27 .24
difficultyisessential.Resultsofthisanalysisrevealedthat thecorrespondenceoftheitemdifficultyisratherhigh.The correlation between the classical item difficulty (p-value) inBrazilandtheNetherlandsvariesfrom.90to.98.The correlationbetweentheitemdifficultybasedonIRTvaries between.87and.96.Theweakestcorrespondenceinorderof itemdifficultybetweenthetwocomparedgroupswasfound forthesubtestSituations.Apparently,theeffectofitembias ontheorderofitemdifficultyinthissubtestwasstronger thanintheothertwosubtests.
A basic question to be answered in this study was to which extent the occurrence of item bias influenced the validityofthetest.Or,inotherwords:whatarethepractical consequencesofitembiasforthevaliduseoftheSON-Rtest? Theresultsshowthatthevalidityofthethreesubtestsinthe Braziliangroupishighlycomparabletothevalidityinthe Netherlands.Correlationsofthesubtestscoreswithteacher’s judgementofthemotivation,cooperationandconcentration oftheBrazilianchildrenarequitesimilartothecorrelations withthesecharacteristicsfoundintheNetherlands(Snijders et al.,1989). The correlations of the subtest scores with school marks on language and mathematics in Brazil are alsoquitesimilartothosefoundintheNetherlands,with exception of the correlation found between the subtest Categories and mathematics which is significantly higher fortheBraziliansampleatthe5%levelofsignificance.All threesubtestsoftheSON-Rshowahighpredictivevalidity ofschoolperformanceintheBraziliansample.Ofcourse,it shouldberemarkedherethatthepresentstudyusesonlya smallsampleofBrazilianchildren,andthatforfullvalidity evidencefortheSON-R5½-17morevaliditystudieswith larger samples sizes should be conducted. Studies with largersamplessizesandallsubtestsoftheSON-R5½-17 wouldalsoallowtheverificationoftheequalityofitsfactor structureacrossDutchandBraziliansamples.Thesestudies wouldprovidemoreevidenceforthecross-culturalvalidity oftheSON-R5½-17.
Althoughtheresultsofthepresentstudyindicatethat thetestcanbeusedinBrazilinitscurrentform,itprovides valuableinformationonhowtoimprovethepictorialcontents inthenextrevision.Evenifthesechangesmighthardlyeffect thepsychometricqualitiesofthetest,thefacevaliditywill improveinsofarasthecontentsbecomelessdominatedby WesternEuropeanlifestyle.Thepresentstudyalsomade apparent that some items of the subtests Categories and SituationshavebecomeoutdatedforuseintheNetherlands andinBrazil.Theseresultswillbeincorporatedinthenext revisionoftheSON-R5½-17aswell.
References
Camilli,G.&Shepard,L.A.(1994).Methodsforidentifyingbiased testitems.ThousandOaks,CA:SagePublishers.
Cattell,R.B.(1971).Abilities:Theirstructure,growth,andaction. Boston:HoughtonMifflin.
Cohen, J. (1992). Quantative methods in psychology: A power primer.PsychologicalBulletin,112(1),155-159.
Guilford, J. P. & Fruchter, B. (1978).Fundamental statistics in psychologyandeducation(6thed.).NewYork:McGrawHill.
Helms-Lorenz, M. & Van de Vijver, F. J. (1995). Cognitive assessmentineducationinamulticulturalsociety.European JournalofPsychologicalAssessment,11(3),158-169. Hu,S.&Oakland,T.(1991).Globalandregionalperspectiveson
testingchildrenandyouth:anempiricalstudy.International JournalofPsychology,26(4),329-344.
Jensen, A. R. (1980).Bias in mental testing. New York: Free Press.
Laros,J.A.&Tellegen,P.J.(1991).Constructionandvalidationof theSON-R5½-17,theSnijders-Oomennon-verbalintelligence test.Groningen:Wolters-Noordhoff.
Muñiz,J.,Prieto,G.,Almeida,L.&Bartram,D.(1999).Testuse inSpain,PortugalandLatinAmericanCountries.European JournalofPsychologicalAssessment,15(2),151-157. Oakland, T., Wechsler, S., Bensuan, E. & Stafford, M. (1994).
TheconstructofintelligenceamongBrazilianchildren–An exploratory study.School Psychology International, 15(4), 361-370.
Siegel,S.&Castellan,N.J.,Jr.(1988).Nonparametricstatisticsfor thebehavioralsciences(2nded.).Tokyo:McGraw-Hill.
Snijders,J.T.,Tellegen,P.J.&Laros,J.A.(1989).Snijders-Oomen NonverbalIntelligenceTest,SON-R5½-17,Manual&Research Report.Lisse:Swets&Zeitlinger.
Tellegen,P.J.&Laros,J.A.(1993).Theconstructionandvalidation ofanonverbaltestofintelligence:therevisionoftheSnijders-Oomentests.EuropeanJournalofPsychologicalAssessment, 9(2),147-157.
Tellegen,P.J.,Winkel,M.,Wijnberg-Williams,B.J.&Laros, J. A. (1998).Snijders-Oomen Nonverbal Intelligence Test, SON-R2½-7,Manual&ResearchReport.Lisse:Swets& Zeitlinger.
TenBerge,J.M.F.&Zegers,F.E.(1978).Aseriesoflowerbounds tothereliability.Psychometrika,43(4),575-579.
Van de Vijver, F. J. R. & Hambleton, R. K. (1996). Translating tests:Somepracticalguidelines.EuropeanPsychologist,1(2), 89-99.
VandeVijver,F.J.R.&Poortinga,Y.H.(1992).Testinginculturally heterogeneous populations: When are cultural loadings undesirable?EuropeanJournalofPsychologicalAssessment, 8(1),17-24.
Weiss,D.J.(1982).Improvingmeasurementqualityandefficiency with adaptive testing.Applied Psychological Measurement, 6(4),473-492.
Zimowski, M. F., Muraki, E., Mislevy, R. J. & Bock, R. D. (1996).Bilog-MG: Multiple group IRT analysis and test maintenanceforbinaryitems.Chicago,IL:ScientificSoftware International.
Recebidoem14.06.2004 Primeiradecisãoeditorialem06.07.2004 Versãofinalem14.07.2004