The Historical Database Suriname and Curacao (HDSC) creates a data infrastructure of inhabitants of Suriname and Curacao (1828-1950). Digitizing the complete civil registry of both countries will allow researchers to study the social, cultural, and demographic history of two tropical colonial societies rooted in slavery and indentured labour. Furthermore, making all data publicly accessible facilitates family history research.
Currently, the team works on the transcription of the civil registry via the crowdsourcing platform HET VOLK (https://hetvolk.org/). The hundreds of thousands certificates will take years to transcribe with citizen scientists. Therefore, the team explores how automated handwritten text recognition (HTR) technology and entity recognition can be integrated into the workflow.
This project focuses on developing a method to extract information from the HTR’ed text and to store it in a database format. Natural Language Processing (NLP) seems a suitable method to recognize the desired entities. Together with the eScience Center a sufficiently accurate NLP model will be developed and trained to speed up the transcription process and to relieve the pressure on the volunteers.
This project is closely related to the project Historical database Suriname Curacao