UPDF AI

Fine-Tuning Pre-Trained Language Models for Improved Retrieval in RAG Systems for Domain-Specific Use

Syed Arham Akheel

2024 · DOI: 10.36948/ijfmr.2024.v06i05.22581
International Journal For Multidisciplinary Research · 1 Citations

TLDR

This paper provides a comprehensive review of the literature on the fine-tuning of LLMs to optimize retrieval processes in RAG systems and discusses advancements such as Query Optimization, Retrieval-Augmented Fine Tuning (RAFT), Retrieval-Augmented Dual Instruction Tuning (RA-DIT), as well as frameworks that enhance the synergy between retrievers and LLMs.

Abstract

Large Language Models (LLMs) have significantly advanced natural language understanding and generation capabilities, but domain-specific applications often necessitate supplementation with current, external information to mitigate knowledge gaps and reduce hallucinations. Retrieval-Augmented Generation (RAG) has emerged as an effective solution, dynamically integrating up-to-date information through retrieval mechanisms. Fine-tuning pre-trained LLMs with domain-specific data to optimize retrieval queries has become an essential strategy to enhance RAG systems, especially in ensuring the retrieval of highly relevant information from vector databases for response generation. This paper provides a comprehensive review of the literature on the fine-tuning of LLMs to optimize retrieval processes in RAG systems. We discuss advancements such as Query Optimization, Retrieval-Augmented Fine Tuning (RAFT), Retrieval-Augmented Dual Instruction Tuning (RA-DIT), as well as frameworks like RALLE, DPR, and the ensemble of retrieval based and generation-based systems, that enhance the synergy between retrievers and LLMs.

Cited Papers
Citing Papers