Multimodal LLM-assisted Information Extraction from Historical Documents: The Case of Swedish Patent Cards (1945-1975) and ChatGPT
Multimodal LLM-assisted Information Extraction from Historical Documents: The Case of Swedish Patent Cards (1945-1975) and ChatGPT
Yunting Xie,Matti La Mela,Fredrik Tell
TLDR
This paper develops a pipeline to retrieve text and events from Swedish historical patent cards using the GPT-4o model to extend the Swedish historical patent database and concludes that the model generates usable yet imperfect data which speeds up data collection and reduces its cost.
Abstract
This paper presents an AI-assisted method for information extraction from historical documents using multimodal large language model (MLLM). We develop a pipeline to retrieve text and events from Swedish historical patent cards using the GPT-4o model to extend the Swedish historical patent database. Our study demonstrates how generic MLLMs can help to save time and labor cost for creating applicable data in a low-source setting, which is a common challenge for digital humanities projects leveraging the latest AI technologies. We also explore the error flagging for automated text recognition that can integrate into traditional information extraction workflow: the MLLMs’ vision capacity helps to identify documents with potential errors that require human verification. We conclude that the model generates usable yet imperfect data which speeds up data collection and reduces its cost. The flags created simultaneously in information extraction enable to evaluate the model’s performance and to allocate human resources for actual error correction through manual transcription. With the rapid development of open MLLMs recently, a promising future step is to explore local solutions for fine-tuning and application of the models.
