UPDF AI

Improving Table Structure Recognition Based on Content-Based Post-Processing

Hoang Huu Son,Nguyen Duc Dung,Nghiem thi Phuong,Tran Giang Son

2025 · DOI: 10.1109/MAPR67746.2025.11133962
International Conference on Multimedia Analysis and Pattern Recognition · 0 Citations

TLDR

This paper proposes a lightweight and effective post-processing method based on content analysis and rule-based correction, applied after table structure recognition, that improves the overall TEDS score and increases the number of perfectly recognized tables, while requiring minimal computational overhead.

Abstract

Table Structure Recognition (TSR) is a critical task in document understanding, particularly in financial and administrative domains where table layouts follow strict formatting rules. While state-of-the-art models such as CascadeTSRNet offer high performance in recognizing table structures, they often overlook content-level semantics, leading to errors in logically structured tables. In this paper, we propose a lightweight and effective post-processing method based on content analysis and rule-based correction, applied after table structure recognition. By combining OCR outputs with heuristic rules derived from financial document standards, our method identifies and rectifies formatting inconsistencies to enhance logical structure. Experimental results on the FinTabNet.C dataset demonstrate that this approach improves the overall TEDS score and increases the number of perfectly recognized tables (TEDS = 1), while requiring minimal computational overhead. Our method also shows promising performance when integrated with both traditional OCR engines and large vision-language models (e.g., Qwen2.5-VL-3B-Instruct).