Self-Supervised Learning of Accelerometer Data Provides New Insights for Sleep and Its Association with Mortality

Insights into the Association Between Sleep and Mortality Revealed by Self-supervised Learning of Wrist-worn Accelerometer Data

In modern society, sleep is an essential basic activity for life, and its importance is self-evident. Accurately measuring and classifying sleep/wake states and different sleep stages is crucial for diagnosing sleep disorders in clinical research and interpreting movement and mental health data provided by consumer devices. However, existing non-polysomnography (PSG) sleep classification techniques mainly rely on heuristic approaches, which are often developed in relatively small sample populations and have certain limitations. Therefore, the aim of this study is to determine the accuracy of sleep stage classification using wrist-worn accelerometers and to explore the association between sleep duration and efficiency with mortality. Research Process and Machine Learning Model Design

Background of the Study

Written by Hang Yuan and his team (including Tatiana Plekhanova, Rosemary Walmsley, Amy C. Reynolds, Kathleen J. Maddison, among others), this paper was published in the 2024 edition of npj Digital Medicine. The driving force behind the research is to overcome the limitations of existing methods for large-scale applications and to improve sleep stage classification technology through self-supervised deep learning methods.

Sleep occupies one-third of the time in life activities, yet sleep assessment in free-living environments is extremely difficult. Traditional subjective sleep diaries can capture individual subjective feelings, but are poorly correlated with objective sleep parameters measured by devices. Although laboratory polysomnography, recognized as the gold standard for sleep measurement, can provide accurate sleep data, its high cost and complexity make it unsuitable for large-scale studies. In comparison, wrist-worn accelerometers, due to their portability and low user burden, are more suitable for large-scale epidemiological studies.

However, the sleep assessment algorithms in current consumer-grade and research-grade wrist-worn devices are mostly proprietary technologies, validated in small populations, and their measurement validity is still unclear. Sleep classification methods (such as distinguishing wakefulness, NREM, and REM sleep stages) primarily rely on manual features, which may not fully utilize all the information in the signals. Therefore, data-driven methods like deep learning may have certain advantages.

Source of the Paper

This paper was co-authored by researchers from various institutions, including the University of Oxford and Seoul National University Bundang Hospital. The study was published in the 2024 edition of the npj Digital Medicine journal, citation number https://doi.org/10.1038/s41746-024-01065-0.

Research Methods and Process

Study Subjects and Data Collection

The study first utilized multi-task self-supervised learning to extract features from 700,000 days of triaxial accelerometer data from the UK Biobank. The authors then fine-tuned the feature extractor with a deep recurrent neural network (RNN) and trained a sleep stage classifier using PSG as the standard. After exclusions, data from 1448 participants’ nights were used for model training. Internal validation included 1395 participants, and external validation included 53 participants.

Polysomnography Comparison

In internal validation, compared to PSG, the total sleep time deviation was 8.9 minutes (95% CI: -89.0 to 106.9 minutes), the REM sleep time deviation was -18.7 minutes (95% CI: -130.9 to 93.6 minutes), and the NREM sleep time deviation was 27.6 minutes (95% CI: -100.6 to 155.8 minutes).

In external validation, the total sleep time deviation was 34.7 minutes (95% CI: -37.8 to 107.2 minutes), the REM sleep time deviation was -2.6 minutes (95% CI: -68.4 to 73.4 minutes), and the NREM sleep time deviation was 32.1 minutes (95% CI: -54.4 to 118.5 minutes). Overall, the model tended to underestimate REM and short sleep, while overestimating NREM and long sleep.

Analysis Using UK Biobank Data

Applying the sleep classifier to 100,000 participants in the UK Biobank, the study examined the association between device-measured sleep duration and efficiency with all-cause mortality. Among 66,214 participants, 1642 death events were observed. Short sleepers ( hours) had a higher risk of death compared to those with normal sleep duration (6 to 7.9 hours), regardless of their sleep efficiency (hazard ratio: 1.58; 95% CI: 1.19 to 2.11 for low sleep efficiency and hazard ratio: 1.45; 95% CI: 1.16 to 1.81 for high sleep efficiency).

Research Results

The study found that the deep learning-based sleep classification technology achieved good agreement with PSG. Even in different internal and external validations, the model demonstrated robustness. Specific data showed that shorter night sleep duration increased the risk of all-cause mortality in participants, regardless of sleep continuity.

Association Between Sleep Parameters and Mortality

In 452,327 person-years of follow-up, 1642 death events were observed. Short sleepers ( hours) had a higher risk of death in both low sleep efficiency (hazard ratio: 1.58; 95% CI: 1.19 to 2.11) and high sleep efficiency groups (hazard ratio: 1.45; 95% CI: 1.16 to 1.81). The higher the sleep efficiency, the lower the risk of death.

Discussion and Contributions

This study’s uniqueness lies in its use of large-scale, multi-center data and a hybrid approach of self-supervised and deep learning methods to improve sleep classification accuracy. The results show that shorter night sleep duration is associated with a higher risk of death, regardless of sleep efficiency. For future research, this method and the results will open new directions for studying sleep and sleep structure in large accelerometer databases.

Through these research results, the researchers not only validated the effectiveness of deep learning models in sleep classification but also highlighted the significant association between short night sleep and mortality. This study provides a powerful tool and evidence for sleep monitoring and health association research.

Conclusion

This paper develops and validates a self-supervised deep learning method to improve accelerometer sleep classification technology and verifies the method’s effectiveness through large-scale data from the UK Biobank. The study emphasizes the significant impact of short night sleep on health, providing important foundational data and methodological support for subsequent research.

Research Details and Tools

The study utilized a self-supervised deep learning model (sleepnet) for feature extraction and classification through multi-task self-supervised learning and recurrent neural networks (RNNs). These methods were trained and validated on 700,000 person-days of data from the UK Biobank. Additionally, the study employed a random forest model for window identification in unlabeled free-living data, further improving classification accuracy.

The outcomes of this research lay the groundwork for future large-scale sleep health monitoring and analysis, offering a practical and efficient methodological innovation. The open-source algorithms and sleep parameters provided by the study will drive research on the relationship between sleep and sleep structure with health outcomes, providing new scientific evidence for improving public health.