ADFCNN: Attention-Based Dual-Scale Fusion Convolutional Neural Network for Motor Imagery Brain–Computer Interface

Dual-Scale Fusion Convolutional Neural Network Architecture Brain-Computer Interface (BCI) has emerged as an enhanced communication and control technology in recent years. In BCI based on electrophysiological characteristics (such as Electroencephalogram, EEG), Motor Imagery (MI) is an important branch that decodes users’ motor intentions for use in clinical rehabilitation, intelligent wheelchair control, cursor control, and other fields. However, due to the complexity of EEG signals—characterized by low Signal-to-Noise Ratio (SNR), non-stationarity, low spatial resolution, and high temporal resolution—accurately decoding motor intentions remains challenging. Existing MI-based BCI decoding mainly uses traditional machine learning and deep learning methods. Traditional machine learning generally consists of two separate steps: feature extraction and feature classification, with methods including Fast Fourier Transform (FFT), Common Spatial Pattern (CSP), and Wavelet Transform (WT). However, traditional methods require extensive expert knowledge, which limits classification performance. In contrast, deep learning, with its powerful representation learning capability, has achieved significant success in the BCI field.

In recent years, Convolutional Neural Networks (CNNs) have demonstrated important applications in MI-based BCI. However, single-scale CNNs have limitations in extracting extensive broadband information from EEG signals, and typical multi-scale CNNs also have shortcomings in fusing information from different scales. To address these issues, this study proposes a novel Attention-Based Dual-Scale Fusion Convolutional Neural Network (ADFCNN), which jointly extracts and fuses EEG spectral and spatial information at different scales while achieving effective information fusion through a self-attention mechanism.

Research Origin

This paper is authored by Wei Tao, Ze Wang, Chi Man Wong, Ziyu Jia, Chang Li, Xun Chen, C. L. Philip Chen, and Feng Wan from the University of Macau, Macau University of Science and Technology, the Institute of Automation of the Chinese Academy of Sciences, Hefei University of Technology, the University of Science and Technology of China, and South China University of Technology. The paper will be published in January 2024 in the IEEE Transactions on Neural Systems and Rehabilitation Engineering journal.

Detailed Research Process

a) Research Workflow

  1. Dataset Description and Preprocessing:

    • Dataset Description: This study uses three public datasets: BCI Competition IV 2a, BCI Competition IV 2b, and OpenBMI. These datasets include different numbers of subjects, sampling rates, and electrode configurations as follows:
      1. BCI Competition IV 2a (BCI-IV2a): Collected from 9 healthy subjects, each subject has 576 trials, 22 electrodes, and a sampling rate of 250Hz.
      2. BCI Competition IV 2b (BCI-IV2b): Collected from 9 subjects, each subject has at least 320 trials, 3 electrodes, and a sampling rate of 250Hz.
      3. OpenBMI dataset: Collected from 54 subjects, each subject has at least 200 trials, 62 electrodes, and a sampling rate of 1000Hz.
    • Preprocessing:
      • Each EEG trial was described as x ∈ R^c×t, where c is the number of electrodes and t is the number of sampling points.
      • Raw EEG signals were down-sampled to 250Hz. For the OpenBMI dataset, a 0 to 40Hz bandpass filter was applied to extract the main EEG frequency bands.
      • Electrode-related exponential moving standardization was used to standardize the EEG data.
  2. Model Structure:

    • Dual-Scale Spatio-Temporal Convolutional Neural Network:
      • Branch-I: Includes large-scale temporal convolution layers, large-scale spatial separable convolution layers, and point convolution layers to extract large-scale spectral and global spatial information.
      • Branch-II: Includes small-scale temporal convolution layers and standard spatial convolution layers to capture small-scale high-frequency information and detailed spatial information.
    • Attention Mechanism: Used to fuse features extracted by both CNN branches, enhancing the flexibility of the fused features adaptively.
    • Dense Layer and Softmax Layer: Used for the final classification output.
  3. Experimental Setup:

    • Five-fold cross-validation and comparison with multiple deep learning benchmark models.
    • Statistical comparison of results using the Wilcoxon rank-sum test.

b) Major Results

The experimental results show that ADFCNN achieves excellent classification performance on all three public datasets. Specifically, the cross-subject average classification accuracy on the BCI-IV2a dataset is 79.39%, a significant improvement of 9.14%; on the BCI-IV2b dataset, it is 87.81%, an improvement of 7.66%; and on the OpenBMI dataset, it is 65.26%, an improvement of 7.2%. Additionally, ablation experiments and visualization analysis further verify the effectiveness of the dual-scale spatio-temporal CNN and self-attention mechanism modules.

c) Conclusion and Significance

This study proposes an attention-based dual-scale fusion convolutional neural network that significantly improves MI classification performance by simultaneously extracting and fusing EEG spectral and spatial information at different scales. This method not only overcomes the limitations of single-scale CNNs in handling EEG signals but also effectively fuses information from different scales through a self-attention mechanism, offering an innovative decoding strategy with broad prospects for BCI applications. Future research can explore the applicability and adaptability of this method in cross-subject tasks.

d) Research Highlights

  1. Innovative Method: The proposed attention-based dual-scale fusion convolutional neural network (ADFCNN) is innovative in EEG signal processing and feature fusion.
  2. Significant Performance Improvement: ADFCNN demonstrates significantly improved classification performance in MI tasks compared to existing multi-scale CNN methods.
  3. Visualization Analysis: The visualization of convolution kernels and the self-attention mechanism deepens the understanding of the model learning process and feature distribution.

e) Additional Information