Speech Prosody Serves Temporal Prediction of Language via Contextual Entrainment

Time prediction plays an important role in language comprehension, making language processing faster and more efficient. Especially in complex auditory and language processing, predicting the information of the upcoming utterance can improve comprehension efficiency and reduce cognitive load. Existing research has shown that listeners can utilize the rhythmic changes in speech prosody to predict the duration of upcoming sentences, thereby accelerating the comprehension process.

In the process of language comprehension, not only the prediction of content is important, but the prediction of time units is also equally important. Slow rhythm and rhythmic speech prosody can help listeners predict the duration of upcoming sentence fragments. The prosodic cues not only depend on their own acoustic characteristics, but more importantly, depend on their context relative to the preceding and following prosody. Previous experiments have proven that even in the reading process, although no sound is heard, an implicit prosody is still constructed in the mind, which helps to understand the reading content.

Research Source

This paper was co-authored by Yulia Lamekina, Lorenzo Titone, Burkhard Maess, and Lars Meyer, affiliated with the Max Planck Institute for Human Cognitive and Brain Sciences and the University Clinic Münster. The paper was accepted on April 8, 2024, and published in advance online in The Journal of Neuroscience. The research was funded by the Max Planck Society, and the authors declared no competing financial interests.

Research Process

Experimental Design and Statistical Analysis

Participants

The experiment involved 40 participants (all native German speakers, right-handed, aged between 18 and 35 years, with a mean age of 28), of which 19 were female. Due to excessive noise in the MEG data of 5 participants, the final sample size was 35. All participants had normal or corrected-to-normal vision, no history of neurological or hearing impairments, and were naive to the purpose of the experiment.

Stimuli and Paradigm

The experiment combined initial speech prosody with subsequent visual target sentences for investigation. The prosodic rate was divided into fast and slow conditions, repeated three times. After the prosodic exposure ended, the target sentence was presented word-by-word. The presentation rate of the target sentence matched the duration of the prosody. For example, the longer sentence lasted 1.884 seconds (314 milliseconds per word, 6 words in total), while the shorter sentence lasted 1.57 seconds (314 milliseconds per word, 5 words in total).

Data Recording and Preprocessing

Sensor Space Analysis

The sensor space analysis first evaluated neural tracking at 0.6 Hz and 0.9 Hz frequencies by comparing the coherence and power values under the fast and slow prosodic conditions. Statistical analysis using cluster permutation tests examined the significant differences between conditions, and the results showed that the delta band activity in the MEG could synchronize to the external prosodic rate.

Main Research Findings

Sensor Space Results

In the prosodic coherence analysis stage, the slow rate condition showed a significant peak in coherence at the 0.6 Hz frequency band, while the fast rate condition peaked at 0.9 Hz. The results indicate that the activity in the auditory cortex synchronized with the prosodic rate. Furthermore, during the target sentence processing stage, the sensor space analysis showed that the delta band activity persisted during the visual stimulation phase and shifted to the left prefrontal cortex, which is consistent with the role of predictive processing.

Source Space Results

In the source space analysis, the right early auditory cortex and superior temporal gyrus exhibited significant coherence peaks at 0.6 Hz for the slow rate condition and at 0.9 Hz for the fast rate condition. This suggests that brain activity showed significant synchronization between different frequencies of speech prosody.

Phrasing and Effect Analysis

In the short sentence condition, when the expected tone length did not appear, the ERF showed a significant omission response effect. Sensor space analysis detected a significant positive-negative cluster in the right central-parietal region. This indicates that participants predicted a longer sentence length under the slow rate training, and when the target sentence length was unexpected, they produced a missing ERF effect.

Conclusion and Application Value

This study demonstrated that the rhythmic nature of speech prosody, through the electrophysiological inheritance of contextual rhythm, provides temporal prediction for language comprehension. This finding supports recent evidence for a more general auditory timing prediction mechanism. These results highlight the importance of incorporating temporal prediction mechanisms into psycholinguistic and cognitive neuroscience models. The research revealed that temporal prediction in auditory-motor interactions facilitates bottom-up perceptual processing through top-down phase-resetting of electrophysiological activity. This mechanism can be applied in interpersonal communication, where prosodic synchrony in conversation may enhance mutual understanding, and it can also help to better understand the potential predictive processes in dialogue.

Another potential application of the research findings is to improve the diagnostic and treatment strategies for patients with language disorders by enhancing their sensitivity to language rhythm, thereby improving their language comprehension abilities.

Research Highlights

  1. Important Finding: Brain activity showed significant synchronization between different frequencies of speech prosody, and this synchronization influenced the subsequent processing of visual sentences.
  2. Novel Methodology: Utilizing MEG techniques combined with a cross-modal experimental design, the study demonstrated the neural mechanisms by which prosodic synchrony acts on temporal prediction in language comprehension.
  3. Application Value: The research results have potential rehabilitation applications and could be used in the treatment of patients with language disorders to improve their language comprehension abilities.

This research has significant scientific and practical value in revealing the temporal prediction function of speech prosody in language comprehension and its neural mechanisms.