Self-Supervised Shutter Unrolling with Events
Event Camera-Based Self-Supervised Shutter Unrolling Method
Research Background and Problem Statement
In the field of computer vision, recovering undistorted global shutter (GS) videos from rolling shutter (RS) images has been a highly challenging problem. RS cameras, due to their row-by-row exposure mechanism, are prone to spatial distortions (e.g., wobble and skew) in dynamic scenes, especially in high-speed motion scenarios. Although existing methods can correct RS effects through artificial assumptions or data-specific characteristics, these approaches often perform poorly in real-world scenarios involving complex nonlinear motions. Moreover, many methods rely on synthetic datasets for training, leading to performance degradation in real-world scenarios due to the “synthetic-to-real” gap.
To address these issues, the authors propose a self-supervised learning framework based on event cameras—SelfUnroll—aiming to achieve high-quality reconstruction from RS images to continuous-time GS videos by leveraging the high temporal resolution information provided by event cameras. This research not only overcomes the limitations of traditional methods in complex motion scenarios but also avoids reliance on expensive high-speed cameras, thereby reducing data collection costs.
Paper Source and Author Introduction
This paper, titled “Self-Supervised Shutter Unrolling with Events,” was co-authored by Mingyuan Lin and Yangguang Wang, among others, with Mingyuan Lin and Yangguang Wang as co-first authors. The authors hail from the School of Electronic and Information at Wuhan University, Xiaomi’s Beijing branch, the Department of Computer Science at ETH Zurich, the School of Computer Science at Peking University, and the School of Artificial Intelligence at Wuhan University. The paper was published in the prestigious international journal International Journal of Computer Vision (IJCV) and was officially accepted in January 2025.
Research Workflow and Experimental Design
a) Research Workflow and Methods
1. Event-Driven Inter/Intra-frame Compensator (E-IC)
The core of the research is the proposal of a module called the Event-based Inter/Intra-frame Compensator (E-IC). The design goal of E-IC is to achieve flexible conversion between RS and GS images by integrating spatial and temporal information. Specifically, E-IC can handle three modes of conversion: RS to GS (RS2GS), GS to RS (GS2RS), and RS to RS (RS2RS). The key idea is to leverage the high temporal resolution information provided by event streams to predict pixel-level dynamic changes within arbitrary time intervals.
E-IC includes two sub-modules: - E-ICT: For temporal brightness transition, implemented using a Residual Dense Network (RDN). - E-ICS: For spatial pixel translation, implemented based on the U-Net architecture.
Finally, E-IC fuses the results of the two compensations to produce a unified output.
2. Self-Supervised Learning Framework
To adapt to the data distribution in real-world scenarios, the authors designed a fully self-supervised learning framework that includes the following three constraints: - Latent Consistency (LLC): Ensures structural consistency in reconstruction by mapping two consecutive RS images to the same latent GS image. - Cycle Consistency (LCC): Ensures brightness stability through the cyclic process of RS to GS and back to RS. - Temporal Consistency (LTC): Provides robust supervision in the temporal domain by utilizing event information between adjacent RS frames.
3. Multi-Frame Fusion Module (MOA)
To address the impact of foreground occlusions and noisy events, the authors further proposed the Motion and Occlusion Aware Module (MOA). The MOA module improves the stability and accuracy of reconstruction by fusing GS results generated from two consecutive RS images.
b) Main Results
1. Performance on Synthetic Datasets
On the Fastec-RS and GEV-RS-Sharp datasets, SelfUnroll demonstrated excellent performance in both single-frame reconstruction and video sequence reconstruction tasks. For example, on the GEV-RS-Sharp dataset, SelfUnroll-M achieved a PSNR of 32.71 dB and an SSIM of 0.934, significantly outperforming existing methods. Additionally, SelfUnroll showed higher robustness in handling complex nonlinear motions.
2. Performance on Real-World Datasets
On the GEV-RS-Real and DAVIS-RS-Event (DRE) datasets, SelfUnroll demonstrated strong generalization capabilities. Compared to methods relying on synthetic datasets, SelfUnroll effectively reduced the “synthetic-to-real” gap by directly adapting to real-world data distributions through self-supervised learning.
3. Occlusion Handling Capability
The MOA module performed exceptionally well in addressing occlusion issues. For instance, when restoring areas occluded by foreground objects, SelfUnroll-M could adaptively fuse multi-frame information to avoid color distortion and texture errors.
Research Conclusions and Implications
c) Research Conclusions
The SelfUnroll method successfully achieves high-quality reconstruction from RS images to continuous-time GS videos by combining the high temporal resolution information of event cameras with a self-supervised learning framework. Experimental results show that SelfUnroll not only performs excellently on synthetic datasets but also maintains high performance in real-world scenarios.
d) Scientific and Application Value
This research holds significant scientific and application potential: - Scientific Value: Proposes a novel event-driven inter/intra-frame compensator (E-IC) and self-supervised learning framework, offering new insights into solving RS correction problems. - Application Value: The SelfUnroll method can be widely applied in high-speed imaging, motion analysis, and video enhancement, particularly in scenarios requiring cost-effective solutions.
e) Research Highlights
- Proposes a unified RS and GS image conversion method suitable for GS frame reconstruction at arbitrary timestamps.
- First application of self-supervised learning to event camera-based RS correction tasks.
- Designs the MOA module, effectively addressing challenges posed by occlusions and noisy events.
Summary
SelfUnroll is an innovative method that successfully addresses the challenges of RS image correction and continuous-time GS video reconstruction by combining event cameras with self-supervised learning. The proposed E-IC module and MOA module provide important references for future research while also offering efficient solutions for practical applications.