A Random Degradation Aggregation Network With Temporal-Spatial Attention for Satellite Video Super-Resolution
Lu Li,Mi Wang,Y. Pi
TLDR
This work constructs a high-order video degradation model tailored to satellite videos, better reflecting their real-world degradation processes, and introduces an optical flow-guided deformable convolution module for aligning sequential frames, integrated with a temporal-spatial attention mechanism to effectively fuse temporal and spatial features.
Abstract
In recent years, video super-resolution (VSR) has achieved significant progress. However, most existing VSR methods rely on fixed and known degradation models, such as bicubic downsampling, which limits their effectiveness in real-world scenarios characterized by diverse and unknown degradations. Moreover, current VSR algorithms are primarily designed for natural scene videos and fail to account for the unique characteristics of satellite videos, leading to suboptimal reconstruction performance when directly applied to them. To address these challenges, we propose an innovative random degradation aggregation network with temporal-spatial attention for satellite VSR. Specifically, we first construct a high-order video degradation model tailored to satellite videos, better reflecting their real-world degradation processes. Second, we employ a high-order grid propagation mechanism combined with a bidirectional recurrent neural network to propagate extracted features across the entire video sequence. Finally, we introduce an optical flow-guided deformable convolution module for aligning sequential frames, integrated with a temporal-spatial attention mechanism to effectively fuse temporal and spatial features. We evaluate our method on three mainstream satellite video datasets (Luojia3-01, Jilin-1, and Zhuhai-1) with upscaling factors of ×2, ×3, and ×4. Performance is assessed using objective metrics (PSNR and SSIM) as well as subjective visual quality. Extensive experiments demonstrate that our method offers a balanced tradeoff between reconstruction quality and efficiency, achieving competitive performance across multiple satellite video datasets with moderate parameters, FLOPs, and memory usage.
