Better speech recognition for primary school app

Date of news: 2 December 2020

Language and speech technologists from Radboud University will work with Cito (institute for test development) to improve the automatic speech recognition of the app Beeldverhaal. With the app - which is currently still a prototype – primary school teachers will be able to visualize students' speaking skills more easily.

Assessing speaking skills in primary education is not easy, says language and speech technologist Catia Cucchiarini. ‘Teachers have to pay attention to many aspects in a conversation. Does a child tell a story with a beginning and end? How about verb conjugation, grammar, vocabulary, hesitations? Because teachers also like to pay attention to the content, that's a lot to keep an eye on during the conversation.’ Cito, the Dutch institute that develops educational tests, has therefore developed a prototype app – Beeldverhaal (Picture story) - in 2019 to assess the speaking skills of students. The student sees four pictures in the app and tells a story. The app records this story. The recorded story is then converted to text with Automatic Speech Recognition (ASH). This way the teacher can analyze the speaking skills at a later, more convenient time.

Beeldverhaal - bron Cito


Cito received the NOT Innovation Award for the app in 2019. ‘But at the moment, Beeldverhaal is not yet working optimally, because the Automatic Speech Recognition is not yet fully functioning’, says Cucchiarini. In the coming year, researchers from the Center for Language and Speech Technology (CLST) and Cito will therefore work on improving the Automatic Speech Recognition of the app. 20,000 euros have been reserved for the project, partly made available by CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities). Cito contributes to the project by deploying its expertise.

‘For us, it ties in with other projects we are working on that involve children's speech’, says Cucchiarini. ‘However, those projects are about learning to read. In those cases, Automatic Speech Recognition works quite nicely, because you know in advance what students are going to read. This is not the case with Beeldverhaal, which involves more spontaneous speech, and that makes it more complex.’

In general, ASR is a lot more difficult with children than with adults. Children have shorter vocal cords that vibrate more quickly. Because the vocal cord vibration period is much shorter, it is more difficult to extract information from children's voices. Child speech is also very variable, partly because children grow quickly and therefore the length of their vocal cords and their speech channel also varies. ‘The differences between grades 3 and 6 are already enormous’, explains Cucchiarini. ‘And in addition, children produce more hesitations and hyphenated words than adults.’


To adapt the speech recognizer for adults to children's speech, a lot of data from children's voices is needed, which can be used to train the speech recognizer. CLST has been collecting speech from children for several years for various projects. In addition, Cito will also collect speech during this project. The project is expected to take one year to complete. Ilse Papenbrug, project manager at Cito: ‘This collaboration is a great next step to make Beeldverhaal suitable for use in the classroom.’

Logo Cito

For more information, please contact Catia Cucchiarini: