Combining citizen science, Handwritten Text Recognition and Entity Extraction

Tuesday 25 June 2024, 12 pm - 1 pm
A hybrid method to transcribe 19th and 20th century Curaçaoan death certificates

In this lecture, Lisa Hoek (MS in Data Science) presents a research pipeline using Transkribus and (Chat-)GPT, used by the 'Historical Database of Suriname and the Caribbean' project group to transcribe historical death certificates from Curaçao.

Lisa Hoek will dive into the technical details of how the research group set up lay-out detection, trained HTR models, and why they chose GPT for Entity Extraction over regular expressions. This led to a hybrid method utilizing both the power of AI and the crowd, improving efficiency while preserving transcription quality.

This lecture, a follow-up to a presentation by Björn Quanjer earlier this spring, is intended for everyone who is curious about the technical aspects and possibilities of this hybrid workflow using HTR, AI and citizen science.

Tuesday 25 June 2024, 12 pm - 1 pm

Registration not necessary