Fourier-Based Global-Local Joint Perception Spatiotemporal Prediction Model
-
Graphical Abstract
-
Abstract
The study of spatiotemporal sequence forecasting aims to generate future image data by learning from historical contexts, with the challenge lying in capturing the complex spatial correlations and temporal evolutions of the physical world. Current research often focuses on specific tasks and tends to emphasize overall change trends at the expense of local details. Additionally, the inefficiency of non-parallel recursive unit models remains a concern. To address these issues, this paper proposes a universal spatiotemporal data prediction model. It first integrates local and global spatial features, employing Fourier-based transformations to capture global dependencies, which are then fused with local relations extracted by the Swin Transformer, achieving joint global-local spatial awareness. Subsequently, a multi-scale fully convolutional module extracts temporal features of varying sizes. A further Fourier transformation converts the time domain into the frequency domain, thoroughly capturing features within the continuous time-evolving stack. Experimental results on the SEVIR, KTH, and MovingMNIST datasets demonstrate the model’s superior generalizability, effectiveness, and scalability. Not only does the model preserve long-term dependencies, but it also enhances computational efficiency.
-
-