Stereoscopic Artificial Compound Eyes for Spatiotemporal Perception in Three-Dimensional Space
Stereoscopic Artificial Compound Eyes for Spatiotemporal Perception in Three-Dimensional Space
This research article was published on May 15, 2024, in the journal Science Robotics, titled “Stereoscopic Artificial Compound Eyes for Spatiotemporal Perception in Three-Dimensional Space.” The first author is Byungjoon Bae, and the corresponding author is Kyusang Lee. The research team primarily hails from the Departments of Electrical and Computer Engineering and Materials Science and Engineering at the University of Virginia.
Research Background
In nature, the compound eyes of arthropods are highly efficient biological visual systems with a broad field of view (FOV) and high motion sensitivity. The praying mantis, in particular, possesses stereoscopic vision, enabling it to recognize objects in three-dimensional space. Traditional compound eyes, due to the limitations of monocular vision, struggle to obtain depth information about static objects. To address this issue and draw inspiration from the praying mantis’s visual system, the research team designed an artificial compound eye system imitating the mantis’s stereoscopic vision for spatiotemporal object perception and tracking in three-dimensional space.
Research Methods
Compound Eye Design and Fabrication
The study utilized heterogeneous integration technology to fabricate flexible photodiodes based on Indium Gallium Arsenide (InGaAs) thin films, which were combined with Hafnium Oxide (HfO_2)-based resistive random-access memory (ReRAM) units to form a one photodiode-one resistor (1P-1R) focal plane array (FPA). This FPA was manufactured into a hemispherical shape to mimic the hemispherical structure of the mantis’s compound eyes and integrated with a custom circuit board through 3D printing to achieve optical sensing and three-dimensional object detection.
The specific fabrication process includes: 1. Using epitaxial liftoff technology to produce InGaAs photodiodes. 2. Integrating photodiodes with HfO_2 ReRAM units on flexible Kapton substrates. 3. Coating each photodiode with a PMMA-based microlens array to enhance focusing capability. 4. Forming the FPA into a hemispherical structure with a radius of 20 millimeters to achieve stereoscopic perception.
Signal Processing and Data Analysis
To achieve rapid response and minimize delay, data storage, and transmission consumption, the research team processed visual information at the system edge using synaptic devices and federated split learning algorithms. The system’s encoded output (spatiotemporal information processed at the pixel level) was further processed by an artificial neural network (ANN) on a local processor. The specific methods are as follows: 1. Integrating ReRAM devices in each pixel for rapid motion sensing. 2. Encoding spatiotemporal information directly at the pixel level through unified storage and reading processes, thereby reducing power consumption. 3. Achieving high-precision, low-latency data processing by combining compact split learning (SL) and federated learning (FL) into a federated split learning (FSL) algorithm.
Simulation and Experimental Verification
To validate the system’s functionality, the research team conducted feasibility studies through three-dimensional ray tracing simulations containing 100,000 training data points and 20,000 test data points. The system’s accuracy was evaluated by calculating the root mean square error (RMSE), with results showing an error rate below 0.3 centimeters when tracking moving objects and a rapid processing speed of 1.8 milliseconds, even when using low-performance microprocessors.
System Structure and Advantages
Compared to traditional complementary metal-oxide-semiconductor (CMOS) imaging systems, the designed compound eye system demonstrated significant advantages in energy efficiency and processing speed. Traditional systems require complex peripheral circuits and substantial storage space, whereas the artificial compound eye system drastically reduces data transmission and power consumption through integrated perception and processing.
Results and Discussion
Experimental results showed that the artificial compound eye system achieved efficient, low-power spatiotemporal object perception and tracking in three-dimensional space. It maintained a root mean square error of approximately 0.3 centimeters during object tracking, with the sensing and tracking process consuming only about 4 millijoules of energy, over 400 times lower than traditional CMOS imaging systems. Additionally, by combining the FSL algorithm and synaptic devices, the system rapidly and accurately processed data with low power consumption.
Significance and Application Value
This research not only demonstrates the feasibility of mimicking complex natural vision systems but also significantly enhances edge computing and perception capabilities through hardware and software co-design. This system is expected to have immense scientific and practical value in future applications, such as autonomous driving, drone navigation, and other fields requiring real-time three-dimensional spatial perception and processing. By imitating the mantis’s compound eye system and incorporating advanced edge computing technology, this study showcased the unique advantages of artificial vision systems in precisely perceiving and tracking three-dimensional objects. This innovative research approach and outcomes offer new ideas for the design of artificial vision systems and provide strong support for low-power, high-efficiency visual processing technologies in practical applications.