portret van dr. N. Vandeweerd (Nathan)
portret van dr. N. Vandeweerd (Nathan)

*CANCELLED* CLS Talk: Nathan Vandeweerd

Thursday 21 November 2024, 4 pm - 5 pm
Using crowdsourced comparative judgement and rubric-based rating to grade texts in the ICLE corpus: a report on reliability and validity.

This CLS talk has been cancelled. 

CLS Talks showcase research done within the Centre for Language Studies (CLS) with the aim to increase awareness of the ongoing research in the institute, and to facilitate discussions and collaborations between researchers. In addition, several external speakers are invited to share their work.

The sessions take place every month on Thursdays at 16:00 and are open to all interested researchers.

Abstract

Comparative judgement (CJ), an assessment method in which judges are shown pairs of texts side-by-side and asked to choose which is “better”, has recently been introduced as a new method to generate reliable and valid proficiency scores for texts in learner corpora (Paquot et al., 2022). Recent (small-scale) studies have shown this approach to be effective for evaluating argumentative essays of varying lengths, even when texts cover a narrow proficiency span (e.g. CEFR B2-C1) or diverse essay prompts (Thwaites, Kollias, et al., 2024).  They have also found that CJ assessments made by judges recruited through a crowdsourcing platform have similar validity and reliability to those made by linguists recruited through a community-driven approach (Thwaites, Paquot, et al., 2024)

This presentation reports on an ongoing large-scale study investigating the extent to which rubric-based judges and CJ raters focus on the same linguistic features when assessing texts. A CJ task was created in which professional raters (N=66) assessed a representative sample of 1300 texts from the ICLE corpus. Text-based measures representing the main rubric constructs (e.g., lexical complexity, cohesion) were then calculated on a subset of these texts (N=222) which had previously been manually error-annotated and assessed on the basis of the CEFR-rubric in the context of another project (Thewissen, 2013). 

The results showed that the rank order produced by the expert judges was highly reliable (SSR = .823) and that 30% of the variance in rank order could be explained by the CEFR level of the text. Higher ranked texts were also found to have fewer errors and higher levels lexical sophistication, lexical diversity, syntactic complexity and cohesion. Taken together, this suggests that comparative judgement can be used to efficiently evaluate L2 texts and that the resulting rank order is a reliable and valid representation of the proficiency level of the texts. 

 

References

Paquot, M., Rubin, R., & Vandeweerd, N. (2022). Crowdsourced Adaptive Comparative Judgment: A community-based solution for proficiency rating. Language Learning, 72(3), 853–885. https://doi.org/10.1111/lang.12498

Thewissen, J. (2013). Capturing L2 Accuracy Developmental Patterns: Insights From an Error‐Tagged EFL Learner Corpus. The Modern Language Journal, 97(S1), 77–101. https://doi.org/10.1111/j.1540-4781.2012.01422.x

Thwaites, P., Kollias, C., & Paquot, M. (2024). Is CJ a valid, reliable form of L2 writing assessment when texts are long, homogeneous in proficiency, and feature heterogeneous prompts? Assessing Writing, 60, 100843. https://doi.org/10.1016/j.asw.2024.100843

Thwaites, P., Paquot, M., & Vandeweerd, N. (2024). Crowdsourced comparative judgement for evaluating learner texts: How reliable are judges recruited from an online crowdsourcing platform? Applied Linguistics, 1–18. https://doi.org/10.1093/applin/amae048

When
Thursday 21 November 2024, 4 pm - 5 pm
Location
E9.14
Contact information

If you have any questions or if you are not on the CLS mailing list, but would like to receive notice about an upcoming talk, please send a message to clstalks [at] ru.nl (clstalks[at]ru[dot]nl).