A cognitive domain specific framework integrating large language model for COVID-19 vaccine sentiment analysis

TLDR

This paper presents COVID-19 retrieval augmented and fine-tuning (RAFT), a novel framework that includes the analysis of COVID-19 vaccine tweets through retrieval augmented-based approaches and integrated domain-specific knowledge through a retrieval-augmented generation-based approach with external knowledge sources.

Abstract

In this cognitive era, vast amount of data are accumulated every day. Analysing such unstructured information and obtaining insights will be challenging. To address this, Large language models have been developed to support analysis of extensive data corpora. However, it tends to cause hallucination due to a lack of proper knowledge sources. If the analysis has to be performed with respect to the health care domain or finance domain, the challenge is raised because of the lack of domain specificity. COVID-19 sentiment analysis is one of the complex responsibilities of the government since it needs to know the opinions of people to take necessary measures. This paper presents COVID-19 retrieval augmented and fine-tuning (RAFT), a novel framework that includes the analysis of COVID-19 vaccine tweets through retrieval augmented-based approaches. This integrated domain-specific knowledge through a retrieval-augmented generation-based approach with external knowledge sources. We employed a transformer-based semantic approach in embedding generation via vector database. Furthermore, this framework exhibited generalizability when integrated with domain knowledge. It uses parameter efficient fine tuning with quantization to use a large language model with a reduced number of parameters, which will allow a model to be used in low-resource-constrained devices. This framework achieved an accuracy of 0.886 on the Twitter dataset containing tweets specific to Indian region and 0.912 on the Twitter dataset with tweets from Global region.

Cited Papers