Related Articles

A unified acoustic-to-speech-to-language embedding space captures the neural basis of natural language processing in everyday conversations

This study introduces a unified computational framework connecting acoustic, speech and word-level linguistic structures to study the neural basis of everyday conversations in the human brain. We used electrocorticography to record neural signals across 100 h of speech production and comprehension as participants engaged in open-ended real-life conversations. We extracted low-level acoustic, mid-level speech and contextual word embeddings from a multimodal speech-to-text model (Whisper). We developed encoding models that linearly map these embeddings onto brain activity during speech production and comprehension. Remarkably, this model accurately predicts neural activity at each level of the language processing hierarchy across hours of new conversations not used in training the model. The internal processing hierarchy in the model is aligned with the cortical hierarchy for speech and language processing, where sensory and motor regions better align with the model’s speech embeddings, and higher-level language areas better align with the model’s language embeddings. The Whisper model captures the temporal sequence of language-to-speech encoding before word articulation (speech production) and speech-to-language encoding post articulation (speech comprehension). The embeddings learned by this model outperform symbolic models in capturing neural activity supporting natural speech and language. These findings support a paradigm shift towards unified computational models that capture the entire processing hierarchy for speech comprehension and production in real-world conversations.

Language measures correlate with other measures used to study emotion

Researchers are increasingly using language measures to study emotion, yet less is known about whether language relates to other measures often used to study emotion. Building on previous work which focuses on associations between language and self-report, we test associations between language and a broader range of measures (self-report, observer report, facial cues, vocal cues). Furthermore, we examine associations across different dictionaries (LIWC-22, NRC, Lexical Suite, ANEW, VADER) used to estimate valence (i.e., positive versus negative emotion) or discrete emotions (i.e., anger, fear, sadness) in language. Associations were tested in three large, multimodal datasets (Ns = 193–1856; average word count = 316.7–2782.8). Language consistently related to observer report and consistently related to self-report in two of the three datasets. Statistically significant associations between language and facial cues emerged for language measures of valence but not for language measures of discrete emotions. Language did not consistently show significant associations with vocal cues. Results did not tend to significantly vary across dictionaries. The current research suggests that language measures (in particular, language measures of valence) are correlated with a range of other measures used to study emotion. Therefore, researchers may wish to use language to study emotion when other measures are unavailable or impractical for their research question.

Cultural nuances in subtitling the religious discourse marker wallah in Jordanian drama into English

This study examines the strategies and challenges of subtitling the religious discourse marker والله wallah (by God) in Jordanian Arabic drama on Netflix. Two works, the series Jinn (2019) and the film Theeb (2014), are chosen as the corpus of the data. The study analyses the pragmatic functions of the religious marker wallah, which Arabs usually use to swear to God in different contexts and examines its English subtitles. The theoretical framework partially employs Vinay and Darbelnet’s (1995) literal translation and omission strategies and Baker’s (2018) translation approaches, including equivalence and paraphrase. A qualitative analysis is conducted to analyse the functions of occurrences of this marker in its pragmatic context, along with its subtitling into English. The study found that the religious marker is frequently omitted in the subtitles or rendered into various linguistic elements such as speech acts, intensifiers, emphatic expressions, filler words, and sarcastic utterances. wallah was either paraphrased or literally translated in some instances. The study concludes that it is necessary to employ unique techniques to overcome the cultural and linguistic gaps, depending on the function of the religious discourse marker, and to improve the reliability and quality of interpreting religious markers in audiovisual settings.

Evolutionary optimization of model merging recipes

Large language models (LLMs) have become increasingly capable, but their development often requires substantial computational resources. Although model merging has emerged as a cost-effective promising approach for creating new models by combining existing ones, it currently relies on human intuition and domain knowledge, limiting its potential. Here we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models such as a Japanese LLM with math reasoning capabilities. Surprisingly, our Japanese math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with substantially more parameters, despite not being explicitly trained for such tasks. Furthermore, a culturally aware Japanese vision–language model generated through our approach demonstrates its effectiveness in describing Japanese culture-specific content, outperforming previous Japanese vision–language models. This work not only contributes new state-of-the-art models back to the open-source community but also introduces a new paradigm for automated model composition, paving the way for exploring alternative, efficient approaches to foundation model development.

Generative language models exhibit social identity biases

Social identity biases, particularly the tendency to favor one’s own group (ingroup solidarity) and derogate other groups (outgroup hostility), are deeply rooted in human psychology and social behavior. However, it is unknown if such biases are also present in artificial intelligence systems. Here we show that large language models (LLMs) exhibit patterns of social identity bias, similarly to humans. By administering sentence completion prompts to 77 different LLMs (for instance, ‘We are…’), we demonstrate that nearly all base models and some instruction-tuned and preference-tuned models display clear ingroup favoritism and outgroup derogation. These biases manifest both in controlled experimental settings and in naturalistic human–LLM conversations. However, we find that careful curation of training data and specialized fine-tuning can substantially reduce bias levels. These findings have important implications for developing more equitable artificial intelligence systems and highlight the urgent need to understand how human–LLM interactions might reinforce existing social biases.

Responses

Your email address will not be published. Required fields are marked *