A bilingual speech neuroprosthesis driven by cortical articulatory representations shared between languages

Bilingual Speech Neuroprosthesis Driven by Cortical Speech Representations

Background

In the development of neuroprostheses, research on decoding language from brain activity has primarily focused on decoding a single language. Thus, the extent to which bilingual speech production relies on unique or shared cortical activity between different languages remains unclear. The current study records and decodes speech motor cortex activity of a Spanish-English bilingual patient using electrocorticography (ECoG) combined with deep learning and statistical natural language models to translate them into sentences in both languages. This research aims to address practical applications of bilingual decoding, specifically achieving speech decoding without the need for manually specifying the target language.

Anarthria, the loss of the ability to articulate speech clearly, is a severe symptom of neurological diseases such as stroke and amyotrophic lateral sclerosis. Currently, invasive speech brain-computer interfaces (BCI) are being developed to restore natural communication abilities by decoding cortical activity. However, existing speech BCI research primarily focuses on decoding a single language, primarily English or Dutch, largely due to the choice of research subjects. Consequently, there is limited research on neuroprosthesis for bilingual and non-English languages. Approximately two-thirds of the world’s population are bilingual, and studies show that these bilinguals frequently use different languages in various social contexts, which significantly impacts their overall personality and worldview. Designing a BCI system suitable for multilingual decoding is essential for restoring communication abilities for all potential beneficiaries.

Paper Source

This paper is authored by Alexander B. Silva, Jessie R. Liu, Sean L. Metzger, and others from the Department of Neurosurgery and Weill Institute for Neurosciences at the University of California, San Francisco (UCSF), and the University of California, Berkeley. The paper is published in the journal Nature Biomedical Engineering on April 1, 2024, DOI: https://doi.org/10.1038/s41551-024-01207-5.

Research Details

Research Process

  1. System Activation and Sentence Decoding:

    • The participant attempts to vocalize, with the speech detection module identifying the initial vocalization attempt. Once detected, the system prompts the next sentence every 3.5 seconds, recording and processing neural features at each attempt.
    • The bilingual vocabulary includes 51 English words and 50 Spanish words. The model generalizes across languages using shared vocalization features, and employs transfer learning to use neural data from one language to enhance decoding performance in the other language.
  2. Vocabulary and Language Model:

    • The model uses a shared bilingual syllable classifier and prioritizes linguistically valid phrases through a language model (LM). It accurately conjugates verbs based on context, choosing the highest-scoring sentence from two language models for display.
  3. Model Training and Evaluation:

    • Classification and detection models are trained using data from isolated target tasks. In this task, the participant attempts to produce target words based on visual prompts, with high-gamma activity (HGA) and low-frequency signal (LFS) features recorded for prediction.
    • The “copy typing” task is used for evaluation, where participants reproduce random English and Spanish phrases as prompted. Performance is primarily measured using the Word Error Rate (WER) metric.

Research Results

  1. Bilingual Speech Neuroprosthesis Performance:

    • The system flexibly decodes English and Spanish phrases. By recording neural features using a high-density ECoG array and optimizing the decoding model, the median online word error rate (WER) for test blocks is 25.0% (99% confidence interval: 17.2, 36.4%). By incorporating language modeling, the WER significantly reduces to 70.6% (99% CI: 61.9, 78.1%).
  2. Speech Detection and Language Classification:

    • A recurrent neural network (RNN) classifier processes neural features for each 3.5-second window, generating a probability distribution across 104 bilingual words. The system achieves a free decoding accuracy of 87.5% (99% CI: 85.7, 100%) in the target language, significantly higher than random predictions and neural-based language selection, highlighting the importance of language modeling in choosing the correct language.
  3. Shared Syllable Representation:

    • The participant exhibits similar neural activity patterns during speech attempts in both languages, further demonstrating shared vocalization features across languages. Models trained on data from one language can effectively classify the other language.

Research Conclusions

The study shows that shared cortical speech representations persist post-paralysis and can be effectively decoded across different languages without needing separate decoders for each language. Transfer learning can significantly enhance the decoding performance of a new language vocabulary using previously collected neural data, reducing training time and participant burden.

Research Highlights

  1. Solved Bilingual Decoding Problem:

    • Utilizing shared vocalization features, bilingual speech decoding is achieved for the first time without needing to manually specify the target language.
  2. Rapid Transfer Learning of Models:

    • Neural data from one language is leveraged to improve decoding performance in another language, greatly reducing training time and usage burden for bilingual participants.
  3. Stable System Performance:

    • The decoding model maintains stable performance for more than 40 days without frequent recalibration.
  4. Extensive Application Prospects:

    • This technology opens new possibilities for BCI applications in bilingual and non-English languages, with significant clinical and scientific research value.

Other Valuable Information

Despite the study’s limitation of involving only one participant, the strong shared vocalization representations between bilingual languages suggest good generalization potential for others who learned a second language early (often accompanying stronger shared representations). Future research should also investigate the impact of language proficiency, age of acquisition, and similarity to the native language on shared representations.

Summary

This study demonstrates the feasibility of a bilingual speech neuroprosthesis that flexibly decodes the user’s intended language and generalizes between languages with minimal training data, offering valuable technological means for restoring natural communication for paralyzed patients. This research not only advances the development of multilingual BCI but also provides an important reference framework for future studies.