NLP Framework for Analysis and Classification of Construction Documentation: A Comparative Study of Transformer and Recurrent Neural Network Architectures

TLDR

Results demonstrate that domain-specific fine-tuned BERT models achieve superior performance with 94.2% accuracy for document classification and 87.8% F1-score for information extraction tasks, significantly outperforming traditional approaches.

要旨

The construction industry generates vast amounts of textual documentation including specifications, contracts, claims, and change orders that require extensive manual review and analysis. This paper presents a comparative study of state-of-the-art Natural Language Processing (NLP) models for automated processing of construction documents. This research evaluates four distinct approaches: traditional machine learning with TF-IDF features, LSTM-based recurrent neural networks, BERT-based transformer models, and domain-specific fine-tuned models. The dataset comprises 2,847 construction documents across four categories, with performance evaluated using accuracy, precision, recall, and F1-score metrics. Results demonstrate that domain-specific fine-tuned BERT models achieve superior performance with 94.2% accuracy for document classification and 87.8% F1-score for information extraction tasks, significantly outperforming traditional approaches. The findings provide crucial insights for implementing automated document processing systems in construction project management.