Spectro-Temporal Modulations Incorporated Two-Stream Robust Speech Emotion Recognition

Research on Two-Stream Robust Speech Emotion Recognition Based on Spectro-Temporal Modulation Features Academic Background Speech Emotion Recognition (SER) is a technology that identifies emotions by analyzing the emotional content in human speech. It has broad application potential in areas such as human-computer interaction, customer service mana...

Latent Circuit Inference from Heterogeneous Neural Responses during Cognitive Tasks

Inferring Latent Neural Circuits from Heterogeneous Neural Responses During Cognitive Tasks Academic Background In cognitive tasks, higher cortical areas of the brain (such as the prefrontal cortex, PFC) are responsible for integrating a variety of sensory, cognitive, and motor signals. However, the responses of individual neurons often exhibit com...

Dynamical Constraints on Neural Population Activity

Temporal Dynamics Constraints on Neural Population Activity: Computational Mechanisms of Neural Activity Revealed by Brain-Computer Interfaces Academic Background How neural activity in the brain evolves over time is a central issue in understanding sensory, motor, and cognitive functions. For a long time, network models have posited that the brain...

Neural Mechanisms of Relational Learning and Fast Knowledge Reassembly in Plastic Neural Networks

Neural Mechanisms and Relational Learning: Rapid Knowledge Reassembly in Neural Networks Background Humans and animals possess a remarkable ability to learn relationships between items in experience (such as stimuli, objects, and events), enabling structured generalization and rapid information assimilation. A fundamental type of such relational le...

Learning with Enriched Inductive Biases for Vision-Language Models

Learning with Enriched Inductive Biases for Vision-Language Models Research Background and Problem Statement In recent years, Vision-Language Models (VLMs) have made significant progress in the fields of computer vision and natural language processing. These models are pre-trained on large-scale image-text pairs to construct a unified multimodal re...

Learning Structure-Supporting Dependencies via Keypoint Interactive Transformer for General Mammal Pose Estimation

Advances in General Mammal Pose Estimation Research Research Background and Problem Statement In the field of computer vision, pose estimation is a fundamental and crucial task aimed at locating key points of target objects in images. In recent years, human pose estimation has made significant progress, but research on animal pose estimation remain...

Seaformer++: Squeeze-Enhanced Axial Transformer for Mobile Visual Recognition

SEAFormer++ - An Efficient Transformer Architecture Designed for Mobile Visual Recognition Research Background and Problem Statement In recent years, the field of computer vision has undergone a significant shift from Convolutional Neural Networks (CNNs) to Transformer-based methods. However, despite Vision Transformers demonstrating excellent glob...

Smaller but Better: Unifying Layout Generation with Smaller Large Language Models

New Breakthrough in Unified Layout Generation Research: Smaller but Stronger Large Language Models Research Background and Problem Statement Layout generation is an important research direction in the fields of computer vision and human-computer interaction, aiming to automatically generate graphic interfaces or layout designs that meet specific re...

Towards Boosting Out-of-Distribution Detection from a Spatial Feature Importance Perspective

Boosting Out-of-Distribution Detection Performance from the Perspective of Spatial Feature Importance Research Background and Problem Statement In practical applications of deep learning models, ensuring that models can reliably reject predictions when faced with inputs from unknown categories is crucial for system safety and robustness. This need ...

Moonshot: Towards Controllable Video Generation and Editing with Motion-Aware Multimodal Conditions

MoonShot——Towards Controllable Video Generation and Editing with Motion-Aware Multimodal Conditions Research Background and Problem Statement In recent years, text-to-video diffusion models (Video Diffusion Models, VDMs) have made significant progress, enabling the generation of high-quality, visually appealing videos. However, most existing VDMs r...