UPDF AI

IDPFlow: A No-Code Agentic Framework for Multimodal Intelligent Document Processing

Goutham Vignesh,Harikrishnan P M,2 Authors,Vishal Vaddina

2025 · DOI: 10.1145/3746027.3761838
0 Citations

TLDR

IDPFlow is a novel framework to unify a no-code, user-centric interface with a sophisticated, tool-augmented agentic architecture for end-to-end multimodal document processing, empowering experts such as business analysts and legal professionals to independently build and deploy sophisticated workflows without writing any code.

Abstract

Intelligent Document Processing (IDP) is critical for unlocking actionable insights from the vast volume of unstructured documents like invoices and medical reports, yet its promise is often unfulfilled as its implementation is typically hindered by significant technical barriers. Traditional IDP systems require deep expertise in programming, machine learning, and intricate model fine-tuning, creating a dependency on specialized data science teams. This effectively sidelines domain experts-the very individuals who possess the critical contextual understanding of the documents-thereby limiting the agility and accuracy of workflow automation. This paper introduces IDPFlow, a novel framework to unify a no-code, user-centric interface with a sophisticated, tool-augmented agentic architecture for end-to-end multimodal document processing, empowering experts such as business analysts and legal professionals to independently build and deploy sophisticated workflows without writing any code. IDPFlow is built upon a powerful agentic architecture, which intelligently utilize a versatile toolkit to execute a range of sophisticated IDP tasks. This toolkit enables a spectrum of high-precision IDP tasks such as multi-class document classification, Document visual question answering (Doc-VQA), key information extraction from text, tables, and checkboxes and long-document summarization. The core of IDPFlow is its dynamic agentic workflow, which redefines user interaction. Upon document upload, the agentic system instantly analyzes the content, classifying sub-documents and proactively suggesting a comprehensive data schema relevant to the use case, shifting the user's role from workflow builder to supervisor. This initial workflow is not static, it can be refined in real-time through simple, conversational instructions, enabling true business agility. Furthermore, the agentic intelligence extends to reusability, allowing existing workflows to be intelligently adapted for new, related tasks, dramatically reducing development time for subsequent use cases. For particularly complex tasks involving long or dense documents, the agentic system can leverage a specialized Multimodal Retrieval-Augmented Generation (MMRAG) pipeline to overcome the context window limitations of standard LLMs. This pipeline utilizes the ColPali model, which excels at generating unified multimodal embeddings, ensuring robust and accurate information retrieval from both textual content and embedded images or diagrams. To foster user trust and ensure verifiability, IDPFlow incorporates a grounded traceback citation mechanism that automatically highlights the precise document segments from which the agent derived its responses, making all outputs transparent and easily auditable. We highlight three key advantages of the framework: 1) Accessibility via an intuitive interface for domain experts; 2) Deep Adaptability and Reusability through dynamic agentic refinement and extensible tools; and 3) Trustworthiness rooted in a verifiable RAG pipeline and granular citation. The framework is projected to reduce end-to-end workflow creation time by 60-70% compared to traditional methods. Its unique combination of a no-code interface and a tool-augmented agentic architecture bridges the gap between technical complexity and domain expertise, accelerating the deployment of powerful, transparent, and scalable IDP solutions across industries.

Cited Papers
Citing Papers