A Transformer-Based Approach Combining Deep Learning Network and Spatial-Temporal Information for Raw EEG Classification

Research Background and Objectives

In recent years, Brain-Computer Interface (BCI) systems have been widely applied in the fields of neuroengineering and neuroscience. Electroencephalogram (EEG), as a data tool reflecting the activities of different neuronal groups in the central nervous system, has become a core research topic in these fields. However, EEG signals are characterized by low spatial resolution, high temporal resolution, low signal-to-noise ratio, and significant individual differences, all of which pose great challenges for signal processing and accurate classification. Especially in the Motor Imagery (MI) paradigm, which is commonly used in EEG-BCI systems, accurately classifying EEG signals for different MI tasks is of great importance for the functionality and rehabilitation of BCI systems.

Traditional MI-EEG classification methods are usually based on manual feature extraction and classification, but these methods may lose useful information from the EEG during the feature extraction phase. In recent years, deep learning models have gained widespread application due to their ability to automatically extract and richly express features. However, existing deep learning methods (such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN)) have limited capability in capturing global dependency features when processing EEG data.

The Transformer model, with its excellent feature extraction and association capabilities, has performed well in fields like Natural Language Processing (NLP). However, it has not been extensively studied in the domain of Motor Imagery EEG classification and visualization, particularly for general models based on cross-subject validation. To address the above issues, the authors of this paper propose an EEG classification method based on a Transformer model, combined with deep learning networks and spatiotemporal information.

Authors and Affiliations

The paper is authored by Jin Xie, Jie Zhang, Jiayao Sun, Zheng Ma, Liuni Qin, Guanglin Li, Huihui Zhou, and Yang Zhan. The primary authors are affiliated with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences and other institutions, including the Shenzhen Key Laboratory. The paper was published in 2022 in the journal “IEEE Transactions on Neural Systems and Rehabilitation Engineering” and was supported by several projects, including the National Key Research and Development Program of China and the National Natural Science Foundation of China.

Research Process

Dataset and Preprocessing

The study uses the Physionet EEG Motor Movement/Imagery dataset, which includes data from 109 subjects with over 1,500 trials. Data were recorded using 64 electrodes at a sampling rate of 160 times per second. The study focuses on motor imagery classification, selecting motor imagery data for left fist, right fist, both fists, and both feet. Preprocessing steps include z-score normalization and the addition of random noise to prevent overfitting.

Model Architecture

The study designed five Transformer-based models, including Spatial Transformer (S-Trans), Temporal Transformer (T-Trans), CNN combined with Spatial Transformer (S-CTrans) and Temporal Transformer (T-CTrans), and a fusion model (F-CTrans).

Transformer Module

The Transformer module adopts an encoder-decoder structure, extracting information by stacking self-attention mechanisms with point-wise fully connected layers. The study used eight parallel attention layers, transforming the input EEG data into query, key, and value vectors to compute weighted values.

Position Embedding Module

It includes three types: relative position encoding, channel-related position encoding, and learnable position encoding. Relative position encoding is computed using trigonometric functions, channel-related position encoding is computed based on cosine distance with respect to the central electrode, and learnable position encoding embeds a trainable matrix.

CNN and Transformer Combined Model

The combined models handle spatial and temporal information separately. The CNN module is used for feature extraction, while the Transformer further processes these features for EEG classification. In the fusion model, spatial and temporal information is processed in parallel through CNN and Transformer submodules, after which features are combined for classification.

Training Settings

The study used the Adam optimizer, with the training epoch set to 50, employing a 5-fold cross-validation method to test model performance. For cross-subject training, individuals were divided into training and testing sets to achieve better adaptability and robustness.

Research Results

Classification Accuracy

The research results show that Transformer-based models perform excellently in two-class, three-class, and four-class classification tasks, achieving best accuracies of 83.31%, 74.44%, and 64.22%, respectively, outperforming other representative models. Additionally, classification accuracy improved further with the addition of the position embedding module.

Visualization Results

Through the visualization of multi-head attention layers, the study found that their attention weights displayed patterns consistent with Event-Related Desynchronization (ERD) in motor sensor regions. Especially in left and right fist motor imagery tasks, attention weights significantly enhanced in the corresponding contralateral hemisphere, aligning with previous ERD findings based on spectral analysis, indicating that the Transformer model can reveal neural mechanisms in motor imagery tasks.

Research Conclusions and Significance

This paper proposes a Transformer-based EEG classification method combining spatiotemporal information and deep learning networks, specifically designing five different models for motor imagery tasks. The research results indicate that the Transformer model performs excellently in EEG classification tasks, and the visualization results demonstrate its potential in revealing neural mechanisms within EEG data. This method not only has broad application prospects in BCI systems but also can be applied to disease diagnosis and other EEG data-based classification tasks.