LLM Based Biological Named Entity Recognition from Scientific Literature
LLM Based Biological Named Entity Recognition from Scientific Literature
Sung Jae Jung,Hajung Kim,Kyoung Sang Jang
Abstract
Recently, the application of Large Language Models (LLMs) in the field of natural language processing has witnessed remarkable growth, revolutionizing the field of bioinformatics by automating the extraction of biological entities from scientific literature. This study presents the development and evaluation of a Biological Named Entity Recognizer (BNER) using a pre-trained Large Language Model (LLM) refined through prompt engineering. The BNER was tailored to identify proteins, genes, and small molecules within scientific texts, specifically targeting the context of p53 protein-related research. To assess the BNER's efficacy, we curated a dataset comprising ten paragraphs extracted from the abstracts and significant sections of five high-relevance scientific papers. The system's performance was quantified through an entity recognition task, resulting in 51 true positives (TP), 10 false positives (FP), and 3 false negatives (FN). The BNER achieved an F1 score of 0.887, demonstrating a high degree of precision and recall. These results validate the utility of LLMs in bioinformatics and highlight the BNER's potential to support and accelerate scientific discovery by providing accurate, structured data outputs suitable for comprehensive analysis.
