Speech-Induced Suppression During Natural Dialogues

During human communication, the brain processes self-generated speech and others’ speech differently, a phenomenon known as the Speech-Induced Suppression (SIS) mechanism. This mechanism involves the motor efference copy in the perception pathway, functioning similar to an “echo” that helps filter internally generated signals to avoid confusing them with external stimuli. In the field of speech processing, SIS is manifested as a specific suppression of self-generated speech, which is of significant relevance to studies of auditory hallucinations in psychopathologies like schizophrenia. While experiments with single syllables have extensively studied SIS, there is still a lack of in-depth understanding of the SIS mechanism in continuous natural conversation.

Source Information

This study was conducted by Joaquin E. Gonzalez and colleagues from various institutions including the Artificial Intelligence Laboratory of the University of Buenos Aires, the Institute of Signals, Systems, and Computational Intelligence, and the Institute of Applied Mathematics. The paper was published in the journal “Communications Biology,” exploring the differences in brain representation of self and others’ speech, particularly the SIS effect in natural conversation.

Detailed Research Steps

a) Research Process

The study utilized electroencephalography (EEG) and high-quality voice recordings to analyze speech in natural, unscripted conversations through the following steps:

  1. Experimental Design: Participants paired up for an object placement game, where each pair had to communicate verbally to place objects on the screen in specified positions.

  2. Data Collection: EEG with 128 high-density electrodes recorded brain activity and speech simultaneously. Each participant wore a directional microphone for synchronized voice recording.

  3. Signal Preprocessing: Collected EEG signals were filtered and subjected to Independent Component Analysis (ICA) to remove artifacts from eye movements and muscle activity.

  4. Feature Extraction: Mel-spectrogram and signal envelope features were extracted from the speech signals, serving as input for model training.

  5. Encoding Model Construction: An encoding model was trained to predict EEG signal characteristics, with model performance validated against listener EEG activity during collaborative tasks.

  6. Analysis of Conversation Phases: The SIS effect was analyzed under different conditions of the conversation, including only the other person speaking, self-speaking, and both speaking simultaneously, through the corresponding EEG responses.

b) Main Research Findings

  1. Brain Representation of Speech Features: The model could significantly replicate the brain representation of others’ speech, showing efficient predictive performance for acoustic features like pitch and frequency bands. The average correlation coefficient reached 0.26 (envelope) and 0.37 (mel-spectrogram) in the θ frequency band, significantly higher than previous study values.

  2. Suppression Effect on Self-Generated Speech: In natural conversation, self-generated speech did not elicit significant EEG responses, confirming a prominent SIS effect. The EEG response to self-generated speech resembled the silence state, with significant responses recorded only when listening to others’ speech.

c) Research Conclusions and Significance

The study demonstrates that SIS not only exists in natural conversation but is also stronger, highlighting the brain’s differential processing of self and external speech stimuli. This approach offers the potential for deeper understanding of related mechanisms in natural contexts, holding significant implications for psychopathology research, language processing models, and speech-user interface domains.

d) Research Highlights

  1. SIS Effect in Natural Conversation: For the first time, the SIS effect was validated in the context of natural conversations, offering new insights into how the brain distinguishes self from external speech.

  2. High Model Predictive Performance: The encoding model showed significantly superior performance in predicting EEG signals in natural speech contexts compared to previous experimental conditions, demonstrating the method’s validity in complex settings.

  3. Independent Verification of EEG Phase Synchronization: Phase Locking Value (PLV) analysis also verified the SIS results, consistently indicating that self-generated speech did not produce significant EEG synchronization signals under natural conversation conditions.

e) Other Valuable Information

The proposed encoding model can be broadly applied to EEG analysis in other continuous unscripted tasks, making it suitable for more complex natural language processing research scenarios and providing a methodological paradigm for future studies.

Conclusion

Through detailed experimental design and innovative encoding model methodology, this study is the first to reveal the SIS effect in natural conversation contexts, offering new perspectives on understanding the brain’s processing mechanisms for natural speech signals. The results not only extend the application prospects of brain science in natural settings but also pave the way for further neurocognitive and linguistic research based on natural conversations.