Multistage attention-based extraction and fusion of protein sequence and structural features for protein function prediction

TLDR

A novel framework termed Multi-stage Attention-based Extraction and Fusion model for GO prediction (MAEF-GO) based on a multistage attention mechanism to predict protein functions, which innovatively integrates the graph convolutional network and the graph attention network to extract protein structural features.

要旨

Abstract Motivation Protein function prediction is important for drug development and disease treatment. Recently, deep learning methods have leveraged protein sequence and structural information, achieving remarkable progress in the field of protein function prediction. However, existing methods ignore the complex multimodal interaction information between sequence and structural features. Since protein sequence and structural information reveal the functional characteristics of proteins from different perspectives, it is challenging to effectively fuse the information from these two modalities to portray protein functions more comprehensively. In addition, current methods have difficulty in effectively capturing long-range dependencies and global contextual information in protein sequences during feature extraction, thus limiting the ability of the model to recognize critical functional residues. Results In this study, we propose a novel framework termed Multi-stage Attention-based Extraction and Fusion model for GO prediction (MAEF-GO) based on a multistage attention mechanism to predict protein functions. MAEF-GO innovatively integrates the graph convolutional network and the graph attention network to extract protein structural features. To address the issue of modeling long-range dependencies within protein sequences, we introduce a frequency-domain attention mechanism capable of extracting global contextual relationships. Additionally, a cross-attention module is implemented to facilitate interactive fusion between protein sequence and structural modalities. Experimental evaluations demonstrate that MAEF-GO achieves superior performance compared to several state-of-the-art baseline models across standard benchmarks. Furthermore, analysis of the cross-attention weight distributions demonstrates MAEF-GO’s interpretability. It can effectively identify critical functional residues of proteins. Availability and implementation The MAEF-GO source code can be found at https://github.com/nebstudio/MAEF-GO, an archived snapshot of the code used in this study is also available via Zenodo at https://doi.org/10.5281/zenodo.15422392.