Encoding speech in depth

AdMaPlace March 15, 2025

0 Comments

Categories: Auditory system, Human behaviour

A unified acoustic-to-speech-to-language embedding space captures the neural basis of natural language processing in everyday conversations

This study introduces a unified computational framework connecting acoustic, speech and word-level linguistic structures to study the neural basis of everyday conversations in the human brain. We used electrocorticography to record neural signals across 100 h of speech production and comprehension as participants engaged in open-ended real-life conversations. We extracted low-level acoustic, mid-level speech and contextual word embeddings from a multimodal speech-to-text model (Whisper). We developed encoding models that linearly map these embeddings onto brain activity during speech production and comprehension. Remarkably, this model accurately predicts neural activity at each level of the language processing hierarchy across hours of new conversations not used in training the model. The internal processing hierarchy in the model is aligned with the cortical hierarchy for speech and language processing, where sensory and motor regions better align with the model’s speech embeddings, and higher-level language areas better align with the model’s language embeddings. The Whisper model captures the temporal sequence of language-to-speech encoding before word articulation (speech production) and speech-to-language encoding post articulation (speech comprehension). The embeddings learned by this model outperform symbolic models in capturing neural activity supporting natural speech and language. These findings support a paradigm shift towards unified computational models that capture the entire processing hierarchy for speech comprehension and production in real-world conversations.

AdMaPlace March 13, 2025

0 Comments

Ultrafast humidity sensor and transient humidity detections in high dynamic environments

Limited by the adsorption and diffusion rate of water molecules, traditional humidity sensors, such as those based on polymer electrolytes, porous ceramics, and metal oxides, typically have long response times, which hinder their application in monitoring transient humidity changes. Here we present an ultrafast humidity sensor with a millisecond-level response. The sensor is prepared by assembling monolayer graphene oxide quantum dots on silica microspheres using a simple electrostatic self-assembly technique. Benefiting from the joint action of the micro spheres and the ultrathin humidity-sensitive film, it displays the fastest response time (2.76 ms) and recovery time (12.4 ms) among electronic humidity sensors. With the ultrafast response of the sensor, we revealed the correlation between humidity changes in speech airflow and speech activities, demonstrated the noise immunity of humidity speech activity detection, confirmed the humidity shock caused by explosions, realized ultrahigh frequency respiratory monitoring, and verified the effect of humidity-triggering in the non-invasive ventilator. This ultrafast humidity sensor has broad application prospects in monitoring transient humidity changes.

AdMaPlace March 9, 2025

0 Comments

Cultural nuances in subtitling the religious discourse marker wallah in Jordanian drama into English

This study examines the strategies and challenges of subtitling the religious discourse marker والله wallah (by God) in Jordanian Arabic drama on Netflix. Two works, the series Jinn (2019) and the film Theeb (2014), are chosen as the corpus of the data. The study analyses the pragmatic functions of the religious marker wallah, which Arabs usually use to swear to God in different contexts and examines its English subtitles. The theoretical framework partially employs Vinay and Darbelnet’s (1995) literal translation and omission strategies and Baker’s (2018) translation approaches, including equivalence and paraphrase. A qualitative analysis is conducted to analyse the functions of occurrences of this marker in its pragmatic context, along with its subtitling into English. The study found that the religious marker is frequently omitted in the subtitles or rendered into various linguistic elements such as speech acts, intensifiers, emphatic expressions, filler words, and sarcastic utterances. wallah was either paraphrased or literally translated in some instances. The study concludes that it is necessary to employ unique techniques to overcome the cultural and linguistic gaps, depending on the function of the religious discourse marker, and to improve the reliability and quality of interpreting religious markers in audiovisual settings.

AdMaPlace March 10, 2025

0 Comments

Low-power Spiking Neural Network audio source localisation using a Hilbert Transform audio event encoding scheme

Sound source localisation is used in many consumer devices, to isolate audio from individual speakers and reject noise. Localization is frequently accomplished by “beamforming”, which combines phase-shifted audio streams to increase power from chosen source directions, under a known microphone array geometry. Dense band-pass filters are often needed to obtain narrowband signal components from wideband audio. These approaches achieve high accuracy, but narrowband beamforming is computationally demanding, and not ideal for low-power IoT devices. We introduce a method for sound source localisation on arbitrary microphone arrays, designed for efficient implementation in ultra-low-power spiking neural networks (SNNs). We use a Hilbert transform to avoid dense band-pass filters, and introduce an event-based encoding method that captures the phase of the complex analytic signal. Our approach achieves high accuracy for SNN methods, comparable with traditional non-SNN super-resolution beamforming. We deploy our method to low-power SNN inference hardware, with much lower power consumption than super-resolution methods. We demonstrate that signal processing approaches co-designed with spiking neural network implementations can achieve much improved power efficiency. Our Hilbert-transform-based method for beamforming can also improve the efficiency of traditional digital signal processing.

AdMaPlace March 9, 2025

0 Comments

Error-driven upregulation of memory representations

Learning an association does not always succeed on the first attempt. Previous studies associated increased error signals in posterior medial frontal cortex with improved memory formation. However, the neurophysiological mechanisms that facilitate post-error learning remain poorly understood. To address this gap, participants performed a feedback-based association learning task and a 1-back localizer task. Increased hemodynamic responses in posterior medial frontal cortex were found for internal and external origins of memory error evidence, and during post-error encoding success as quantified by subsequent recall of face-associated memories. A localizer-based machine learning model displayed a network of cognitive control regions, including posterior medial frontal and dorsolateral prefrontal cortices, whose activity was related to face-processing evidence in the fusiform face area. Representation strength was higher during failed recall and increased during encoding when subsequent recall succeeded. These data enhance our understanding of the neurophysiological mechanisms of adaptive learning by linking the need for learning with increased processing of the relevant stimulus category.

AdMaPlace March 10, 2025

0 Comments

Related Articles

Responses