Post-Stroke Hand Gesture Recognition via One-Shot Transfer Learning Using Prototypical Networks

Background Introduction

Stroke is one of the leading causes of death and disability worldwide, with the total number of stroke patients increasing globally due to population aging and urbanization. Although advances in treatment have reduced mortality rates, the number of survivors requiring rehabilitation has increased significantly. This is particularly evident in low-income and lower-middle-income countries where healthcare resources are limited, creating an urgent need for adaptive and cost-effective rehabilitation interventions (Feigin et al. 2022).

Stroke rehabilitation is a long and burdensome process, consuming both physical and financial resources. Therefore, the importance of automated assessment systems in reducing rehabilitation costs and the need for physical therapist visits is becoming increasingly apparent. These systems assess the motor functions of stroke survivors through sensor data, providing a low-cost method for interactive rehabilitation exercises, especially suitable for home rehabilitation (Chen et al. 2017). Additionally, incorporating games into these systems can increase motivation and engagement among stroke survivors by allowing them to engage in interesting, repetitive movements or tasks that promote rehabilitation (Proffitt et al. 2011).

Challenges in Automated Rehabilitation Assessment Systems

Currently available automated rehabilitation assessment systems typically use sensors such as electromyography (EMG), force myography (FMG), and inertial measurement units (IMU) to collect data. However, sensor displacement and individual differences among wearers significantly affect classifier performance. Moreover, the physiological signals of stroke survivors differ significantly from those of healthy individuals, making data interpretation more complex (Ao et al. 2013; Zhang et al. 2016; Raghavan 2015).

Unlike traditional human-operated assessment systems, automated systems need to consider participants’ activity capabilities and develop sensor designs and analysis techniques that accommodate the unique needs and characteristics of stroke patients. Therefore, this study proposes using prototype networks for one-shot transfer learning, optimizing feature selection, and increasing window size to improve the classification accuracy and applicability of models in home rehabilitation systems.

Paper Source

This paper was written by Hussein Sarwat, Amr AlKhashab, Xinyu Song, Shuo Jiang, Jie Jia, and Peter B. Shull, from institutions including the School of Mechanical Engineering at Shanghai Jiao Tong University and Huashan Hospital of Fudan University. It was published in the Journal of NeuroEngineering and Rehabilitation in 2024, with the article number 21:100 (2024).

Research Design and Methods

Research Subjects and Sensor Configuration

Data was collected from 20 stroke patients with Brunnstrom hand stages between 2 and 6. The trials were conducted at the Department of Rehabilitation Medicine at Huashan Hospital in Shanghai, China. Informed consent was obtained from all participants before the experiment, and approval was granted by the Ethics Review Committee of Huashan Hospital.

Two types of wearable sensors were used to collect participant data: a wristband equipped with one IMU and eight air pressure sensors placed on the wrist, and another wristband with six EMG sensors placed on the forearm, about 10 cm from the elbow. These sensors were used to collect gesture data (e.g., gross flexion, gross extension, wrist flexion, wrist extension, forearm pronation, forearm supination, and rest).

Data Processing and Feature Extraction

Data was collected and processed using Matlab and Python. After data filtering, the data was normalized using the mean and standard deviation of each trial, then segmented using the overlapping segmentation method with a window size of 222 ms and a step size of 55.6 ms. For feature extraction, a total of 394 features were extracted from each sensor channel, including time-domain, frequency-domain, and time-frequency domain features, providing sufficient information for the classification models.

Classifiers and Model Evaluation

Two main types of models were involved in this study: subject-independent models and models trained through transfer learning. The former was trained using leave-one-subject-out cross-validation, while the latter used few-shot learning methods, using data from a new participant to retrain the model. Prototype networks were employed and compared with other traditional machine learning methods (such as SVM, LDA, LGBM, etc.) and neural network methods.

Among these models, prototype networks were used for few-shot learning, improving model accuracy through a single sample from a new participant. By dividing samples into query sets and support sets, calculating their prototypes, and classifying by comparing the embeddings of the query set and support set.

Experimental Results Analysis

The research results show that the one-shot sample prototype network (PN) method using large windows significantly improved classification accuracy (p < 0.05), reaching 82.20%±10.85%, notably higher than other subject-independent classifiers and traditional transfer learning methods. The attached figure shows the prediction accuracy for different gestures.

Feature Selection and Dimensionality Reduction

Different classifiers responded differently to feature selection and dimensionality reduction methods. Using K-Best feature selection significantly improved the performance of PN, TL, and SVM. While PCA performed well in LDA, it was less effective in other classifiers.

Impact of Window Size

Increasing window size improved accuracy for all classifiers, despite reducing sample size. The most significant improvements were seen in SVM and LGBM, with accuracy increases of 6.48% and 6.34% respectively. On average, all classifiers showed an accuracy increase of about 4.28%.

Impact of Sample Size

In comparing five-shot learning methods with one-shot large window methods, the latter performed better with fewer samples.

Comparison of Subject-Dependent Models

Compared to other subject-dependent models, the method proposed in this paper performed similarly, indicating good robustness in integrating large-scale models to new participants.

Research Significance and Conclusions

This paper proposes a one-shot transfer learning method based on prototype networks to optimize gesture recognition in home rehabilitation systems. These improvements can enhance device performance without frequent supervision. Additionally, the paper explores the importance of time segmentation and feature selection. Results show that extending time segments can improve performance but at the cost of real-time capability, while the effect of feature selection varies by classifier, significantly improving or decreasing model performance. Therefore, considering dimensionality reduction techniques that remove noise and retain necessary information before input to the model is crucial.

This research lays a solid foundation for the subsequent development of adaptive and accurate home rehabilitation systems, providing better rehabilitation support for stroke patients.