Methodological standards - Traditionalist reactions

3.4 Traditionalist reactions

3.4.3 Methodological standards

The traditionalists’ answer to the experimentalists’ call for more rigour and adherence to common methodological standards comes in two overall varieties.

One variety agrees in principle that methodological standards are important but disagrees that the traditional method is unscientific. The other variety disagrees altogether that living up to certain methodological standards should be a goal in itself.

On the first side, Newmeyer (2007, 396) agrees “in principle” with some of the points made by Featherston (2007) about applying quantitative standards in syntax but makes the point that more rigorous testing is only necessary if the rewards justify it (either through achieving positive effects or avoiding negative ones). Interestingly, this response accords with the proposals of Feath- erston (2009). In this paper, compared to Featherston (2007), he takes a more traditional-friendly approach and argues that while methodological rigour on

some points is important, some aspects of experimental settings may not make a great difference in most cases, and therefore these can be ignored in most ex- periments. He recommends focusing on test materials and worrying less about subjects and specific methodology. Culicover and Jackendoff (2010) likewise argue that while proper control of stimulus material is of major importance in science, using experimental methods is in itself no guarantee that proper data practices are followed (going against the arguments of, e.g., Haider 2007, 2009).

Along the same lines, Phillips and Wagers (2007) argue that because the traditional method is so easy and inexpensive to check, not a lot of safeguards on reliability of single data points are needed (contrary to psycholinguistics, where getting trustworthy information about one single fact is laborious, and where more explicit checks and transparency is therefore needed). On this approach, the proponents of XSyn are right in principle, but there is no need to worry in the case of traditional judgement collection as the practice is already satisfying the standards to a reasonable degree today.

The other type of reply is exemplified by Grewendorf (2007), who argues more explicitly against the focus on objectivity and standards of methodology as a goal in itself.

Instead of taking more care with the data and collecting reliable data, scientists often just disregard the phenomena in order to find principles that seem to give some deep insight into reality, where reality is taken to be represented by the abstract systems that are constructed rather than by an unstructured conglomeration of phenomena. There are numerous examples in the history of the natural sciences, some of them mentioned in Chomsky (2002), which show that criteria such as simplicity, elegance, and fruitfulness are as important in the assessment of theories as are reliable data. (Grewen- dorf, 2007, 379)

This also goes against the view that accounting for the data should have main priority in syntax (e.g., Featherston 2007).

The experimentalists’ calls for more transparency are not explicitly answered by the traditionalists. Sprouse and Almeida (2013a, 227), however, suggest that researchers should be more explicit about a) potential confounds that could have an impact on their results and the interpretation of these, and b) whether there exists empirical evidence that such a confound is actually affecting the results.

They thus agree with the general sentiment of the proponents of XSyn regarding transparency.

Some authors downplay the magnitude of the difference between XSyn and traditional syntax. Sprouse (2015), for instance, believes that it is a difference

of degree rather than a difference of kind. He writes that both approaches make use of “tightly controlled experimental conditions” (presumably the presenta- tion of controlled stimulus) to test “the same behavioural response” (intuitive judgements). The difference, he believes, is in the number of participants and stimulus items used (Sprouse, 2015, 89). He does mention one difference of kind though, the background knowledge of participants. In most cases where the traditional method is used, trained linguists are used as subjects, whereas naive subjects tend to be used in most studies done with experimental methods. One worry about the use of experts as subjects is that there might be cognitive biases at work (as mentioned), but Sprouse (2015) points to some of the studies mentioned above that show a very high level of convergence between the results of the traditional end experimental methods. These results suggest that cognitive bias among the linguist-subjects might not have a large effect on the data.

3.4.4 Summing up

The experimentalists’ three main objections to the traditional judgement collection method in syntax were related to the lack of sensitivity of the method, the potentially low reliability and validity of the method, and the lack of adherence to common standards in related disciplines (let us call these the sensitivity objection, the reliability objection, the validity objection, and the methodology objection).

The traditionalists’ main answer to the sensitivity objection seems to be that while some nuances are fine, it is not at all certain that all details revealed by experimental methods are relevant to the phenomenon of interest, i.e., grammaticality.

The reliability objection is answered by some traditionalists by showing how the traditional method yields highly reproducible results, while others make the point that valuable information on micro-variation may get lost when results are averaged across speakers. The validity objection, when addressed, is answered by claiming that although things like experimenter bias could in theory be problematic, it does not seem to pose a de facto problem for the traditional method.

There seems to be two strategies for answering the methodology objection as well. Some authors think that we can show that the traditional method is fully adequate as it is,⁹ and some add that using an experimental procedure is, in and of itself, no guarantee that greater care will be taken with data and theoretical issues. Others instead believe that the main point of a scientific test

9Proponents of XSyn might argue, however, that it took experimentalist raising the issue and the use of experimental methods to find this out, while this is something users of the traditional method should have worried about themselves all along.

is to give valuable insights, not to be maximally methodologically rigorous or objective according to some standard.

While this discussion has been phrased in terms of proponents and opponents of the traditional method, in reality the discussed works can be seen as aligned along a continuum. Some authors, like Wasow and Arnold (2005), who argue that the traditional method should be supplanted completely with experimental methods, are at the extreme experimental end of the spectrum. Most authors are somewhere in between. Those who are at the more experimental end advocate that experimental methods should be used in general but that in some cases it might be enough to use the traditional method (e.g., Featherston 2009), and those at the more traditional end (like Phillips 2009) argue that the traditional method should in general be used but that in some cases, it might be beneficial to use experimental methods. Interestingly, even the most ardent defenders of the traditional method all seem to agree with at least some parts of the experimentalist programme (at least in principle), and the extreme traditionalist end of the spectrum is thus (at least in principle) unoccupied.

3.5 Experimentalist reactions

3.5.1 Convergence of expert and lay intuitive judgements

Arguably, one of the most interesting results to come out of the discussion between the experimentalists and traditionalists is the work by Sprouse and colleagues showing that there is a high level of convergence between the judgements of lay subjects and those of trained linguists. How do the experimentalists reply to that?

Gibson et al. (2013) take up the discussion with Sprouse and Almeida (2013a) about the reliability of the traditional method. In their opinion, since linguistic theories are meant to get every grammaticality assignment right, it is not nec- essarily the case that an error rate of 2-5% is acceptable (Gibson et al., 2013, 4). Gibson et al. argue that we want our syntactic theories to account for not just one or a few but a large number of phenomena. However, their calculations show that if we allow for an error rate of 5% (and assume the standard power level of 80%), then we can only build our theory on less than five data points before we are likely to get a theory that is based on incorrect data. If any data point that the theory is built on is wrong, the theory will be wrong in some way or another as well, and non-quantitative methods have no method to detect and correct such errors. Even with low error rates, we have no way of knowing which data points are correct, and there will be incredibly many ways to divide up the set of data, so we will not be able to go through each one. With experimental

methods, on the other hand, one has the option of lowering the false positive rate by adding more participants. Experimental methods not only give us good data, they also tells us something about the reliability of specific sets of data and ways to improve the reliability when necessary.

They also note that the estimates made in Sprouse and Almeida (2013a) based on journal data hinge on a random sampling of every standard accept- ability judgement published in the journal Linguistic Inquiry from 2001 to 2010.

Thus, they point out, the sample contains a lot of completely uncontroversial judgements (e.g., that “was kissed John” is unacceptable, Gibson et al. 2013, 3) and not just those intuitive judgements that differentiate current theories.

For that reason, Gibson et al. think that Sprouse and Almeida’s estimate of the convergence between experts and lay subjects is inflated. Therefore, they argue, the sample cannot be used to show that the traditional method can be used to choose between the theories that are debated today, which was one of the original motivations for XSyn.

Gibson et al. (2013) also argue against the argument put forward by Sprouse and Almeida (2013a) that the statistical power needed to detect the type of effects linguists are typically after are usually quite low, and that a small number of judgements may therefore suffice to gain sound results about those effects in general. Gibson et al. (2013) argue that even if most syntactic questions have large effect sizes, one still needs to collect data and calculate the effect size for the specific contrast one is working on to verify that this phenomenon is, in fact, one of the ones with a large effect size. There is no way of knowing this without doing the experimental study. Otherwise, they point out, you get a false sense of confidence in your results (Gibson et al., 2013, 5).

No documento Justifying the evidential use of intuitive judgements in linguistics (páginas 62-66)