Adaptation of the virtual assessment of mentalizing ability and evaluation of its utility and psychometric properties in Chinese individuals on the schizophrenia spectrum

Introduction
Theory of mind (ToM), the ability to attribute mental states (emotion, intention, and thoughts) to other individuals, is useful in explaining and predicting human behavior1. It is a multidimensional construct classified according to types (cognitive vs. affective) and orders (first vs. second)2,3. Affective ToM (inferring others’ emotions) requires more personal engagement than cognitive ToM (inferring others’ thoughts and intentions)2,4. Second-order ToM (reasoning of how another person thinks about a third person’s mental states) requires high-level cognitive processes and is more complex than first-order ToM (inferring the thoughts of another person)3,4.
ToM impairment has been found in people with schizophrenia spectrum disorders5. Schizophrenia is a serious chronic psychiatric disorder with a global prevalence of 1%6. During social interactions, people with schizophrenia commonly present difficulties with emotion recognition, ToM, and empathy5. In particular, accumulating evidence has consistently shown that ToM is impaired across different stages of schizophrenia spectrum disorders, including in at-risk individuals with schizotypal traits or social anhedonia (SA) (SA, a trait whereby individuals are unable to take pleasure from social and/or interpersonal relationships)7,8,9,10,11,12,13. Notably, negative symptoms, such as SA and asociality are strongly associated with ToM impairment compared with other symptoms of schizophrenia14,15. Thibaudeau et al.14 suggested that clinical symptoms affect how an individual perceives and interprets information, thereby affecting the underlying mechanisms of ToM. Research16,17 has demonstrated that individuals with high levels of SA tend to score significantly lower on the Hinting Task, a mentalizing measurement that assesses participants’ understanding of indirect speech18, but inconsistent results19 have been observed when using the Reading Mind in the Eyes Test, another mentalizing assessment tool that requires the participants to infer a person’s emotional status by studying the photographs of their eyes20.
In recent years, researchers have attempted to use a more sophisticated approach to assess ToM21,22. Frith introduced different types of ToM errors, i.e., “hypermentalizing” (attributing a mental state when none is present), “reduced ToM” (failing to identify a mental state), and “no ToM” (a complete absence of mental inference)20. These distinctions are crucial, as evidence suggests significant associations between error types and clinical symptoms4,22,23,24. However, most ToM instruments do not assess the four subconstructs and error types of ToM. In addition, researchers have questioned the validity of the classic measures of ToM, particularly in ecological validity1,25, making it challenging to understand the level of ToM impairment in schizophrenia patients or individuals with high SA.
The Virtual Assessment of Mentalizing Ability (VAMA) is an ecological ToM assessment tool recently developed in Australia. It was designed to assess the four subconstructs of ToM separately and included error scores for the three types of ToM errors26. The VAMA showed acceptable to satisfactory reliability and validity (sensitive to ToM impairment in people with early and chronic schizophrenia)4,26,27. However, considering the cultural specificity of social cognitive processes, the VAMA, which was developed for use in Western culture, may not be suitable for non-Western cultures, such as Chinese culture28,29. Therefore, the present study aimed to (1) adapt the VAMA for use in the Chinese context and evaluate its psychometric properties in 100 healthy individuals (Study 1) and (2) the utility and sensitivity of this instrument in patients with schizophrenia and at-risk individuals (Study 2). Additionally, intelligence has been identified to be strongly associated with ToM (having reasoning skills and appreciating the mental states of others) among schizophrenia patients or psychosis30,31; thus, in this study, the effect of IQ was controlled to examine the differences between the groups.
The primary aim of this study was to adapt and validate VAMA for use in the Chinese context. We hypothesized that the adapted version would demonstrate psychometric properties comparable to the original Australian version26. Specifically, we hypothesized this version would have: 1a) satisfactory test–retest reliability, 1b) comparable mean accuracy of the four subconstructs with the original version, 1c) strong correlation between items and their corresponding subscales to ensure internal consistency, 1d) no floor nor ceiling effects, 1e) significant main effects of Order and Type, where better performance on first-order than second-order ToM and cognitive ToM better than affective ToM (construct validity) will be shown. For Study 2, we hypothesized that schizophrenia patients would 2a) show poorer performance in the subconstructs of VAMA, 2b) make more errors in all three types of ToM errors than the healthy controls and 2c) we expected that associations would be found between schizophrenia clinical symptoms and the error types of VAMA.) For the individuals with a high level of SA, we also hypothesized that 2d) they would show poorer performance in the subconstructs of VAMA and 2e) make more errors in all three types of ToM errors than those with low SA.
Study 1
Method
Participants
A total of 100 healthy young participants were recruited in Beijing, China (Mage = 23.49 years, SD = 1.13, range = 22–28; Myears of education = 16.85 years, SD = 0.82, range = 13–19; 50% male, see Table 1). The exclusion criteria were as follows: (1) a medical history of brain injury or psychiatric or neurological disorders; (2) a history of substance abuse or addiction; (3) a family history of psychiatric or neurological disorder; and (4) inability to read or understand Chinese. Participants were recruited from the general community through convenience sampling and snowballing techniques. Of the 100 participants, 30 were randomly selected to participate in re-testing. The participants received RMB110 after completing the study. This study was approved by the Human Subjects Ethics Committee of the Institute of Psychology, Chinese Academy of Sciences in Beijing (reference number: H20045). Written informed consent was obtained from each participant before they enrolled in the study.
Measurement
The adapted version of the VAMA
The VAMA version used in this study is an adaptation of the original VAMA26 for use in the Chinese context. While keeping the characters’ basic needs, framework, and task administration the same as in the original VAMA, we decided to change the perspective from the “first-person” to a “third-person” (i.e., outside observer) and replace the virtual game environment navigation with two-dimensional (2D) non-immersive virtual reality (VR) videos for two reasons: (1) to be more in line with the mechanism of ToM. Vogeley and his colleagues32 investigated the neural mechanisms underlying first-person perspective and third-person perspective. Although common activations for both perspectives were found in the occipital, parietal, and prefrontal areas, the “third-person perspective” showed increased activity in the precuneus, right superior parietal, and right premotor cortex, while the “first-person perspective” showed an increased activity in the mesial prefrontal cortex, posterior cingulate cortex, and superior temporal cortex bilaterally. Both “first-person perspective” and “third-person perspective” rely on egocentric reference frames. However, the “third-person perspective” requires an additional translocation of the egocentric viewpoint, leading to differential neural processes. Therefore, we believe that there will be differences between the original VAMA and the current adapted version. Brown33 also showed that the “first-person” perspective is about the self, known as egocentric, but a “third-person” perspective involves the perception of others from an external frame of reference, regarded as “all-centric.” Since ToM is the process of adopting the third-person perspective to infer the mental state of others, we decided to adopt the “third-person” instead of the “first-person” perspective in this study. (2) The original VAMA involves a shopping task that places considerable demands on the executive function of the participants, thereby increasing their cognitive load. This heightened cognitive demand can detract from the primary focus of our study, which is to assess mentalizing ability. To align with the main objective of our research, we adopted a methodology similar to that used in other well-known ToM tasks, e.g., the Movie for the Assessment of Social Cognition (MASC)34. Consequently, our study exclusively used ToM video clips and corresponding ToM questions. This approach ensured that the participants’ cognitive resources were directed toward evaluating their mentalizing abilities, rather than being diverted by tasks that require extensive executive functioning. By focusing on ToM-specific stimuli, we aimed to obtain a more accurate assessment of the participants’ mentalizing capabilities.
New scenarios and ecological validity
The original storyline of the VAMA was developed within a Western cultural framework, which may not be entirely suitable for the Chinese context. As Van de Vijver35 highlighted, norm-driven adaptations are essential to accommodate cultural differences in norms, values, and practices, as specific cultural contexts may not universally apply. Social norms influence not only individual actions but also the expectations others have of those actions, potentially affecting the inference of mental states. For instance, cultural variations are evident in how faux pas are perceived; mistaking a customer for a waiter is universally considered a faux pas by English participants (100%), but not by Canadians (65.4%). Similarly, while 100% of English participants view giving up a seat to an older passenger on a bus as normal behavior, 21.2% of Chinese participants perceive it as a faux pas, despite it being a control item with no faux pas intended. These variations underscore the need for concept-driven adaptations rather than literal translations of neuropsychological assessments36.
As mentioned, the storyline of the VAMA26 was developed based on Western culture, which may not be appropriate for the Chinese context. Thus, after seeking advice from two subject-matter investigators (YC and DS, both are co-authors of the current study and neuropsychologists), some videos that were not culturally suitable were removed (#1, #3, #4, #6, #7, #9, and #10; from the original VAMA)26, for example, in scenario #3 from the original VAMA, the story is about the test-taker and character B meets character A at the hairdresser, where she has just had her hair cut. Character B positively jokes about Character A’s hair, saying “You need to get a new hairdresser. That looks awful.” While this scenario is considered normal/appropriate in Western countries, it is unusual that friends will tell this kind of joke between friends in China. Some new scenarios (control #3, control #4, #3, #4, #5, #6, #7, #8, and #10; from the current adapted version of VAMA) were created based on the VAMA framework, capturing the Chinese collectivistic cultures of obedience to authority, filial piety, ancestral worship, conservatism, and perseverance37,38. To ensure the created scenarios are culturally sensitive and ecologically valid, pilot data were obtained from 23 volunteers (Mage = 23.57 years, SD = 2.46), asking for comments and opinions on the scenarios.
Ecological validity refers to the extent to which outcomes derived from controlled experimental conditions correspond to the performance observed in real-world settings39. However, as mentioned by Cavieres et al.39, few ToM assessments of schizophrenia have explicitly defined ecological validity. Thus, to ensure that the scenarios relate to young adults’ daily life events, and to improve the verisimilitude and ecological validity of the stimuli, three questions, one each on familiarity, understanding, and relatedness, were designed. The volunteers were recruited to read all 10 scenarios and then asked to provide comments on the scripts and rate each scenario based on the level of familiarity (Question: “How familiar are you with this scenario?” rated from 1 = “Not familiar at all” to 7 = “Very familiar”), understanding (Question: “To what extent do you believe you understand the thoughts or emotions of these story characters at that time?” rated from 1 = “Do not understand at all” to 7 = “Understand very well”), and more specifically, relatedness to determine how closely the scenarios mirror real-life situations and ensure that the scenarios are relevant to their (or their peers’) real-life experiences (Question: “How closely does this scenario align with the lives of young people (aged 18–30)?” rated on a 7-point Likert scale from 1 = “Not close at all” to 7 = “Very closely aligned”). The volunteers reported that most of the scenarios were familiar to them (M = 4.37, SD = 1.04), easy to understand (M = 4.93, SD = 1.13), and close to their daily lives (M = 4.70, SD = 1.10). Scripts were also provided to four subject-matter experts (YC, YW, RC, and DS) to validate the content of the Chinese version. They are the co-authors of the current study, are experienced in developing assessment tools, conducting social cognition studies and have published papers in social cognition and mental disorders. After receiving all of the comments, appropriate revisions were made, and the final scripts were used in the video filming. Appendix A (Supplementary Materials) shows an example scenario of the VAMA, and Appendix B (Supplementary Materials) shows one of the sample questions.
A professional technical production team was employed to carry out the filming at The Hong Kong Polytechnic University. The first version of the scenarios were recorded on a green screen and then transformed into a digital format. Editing and postproduction were completed using Adobe Premiere Pro® (Adobe®, USA). However, different from the original version26, the current version did not develop the virtual shopping environment. Considering the verisimilitude and authenticity which may bias participants’ judgments40, we decided to employ a real-world environment to develop the videos instead of building the virtual shopping center environment in the current study. In addition, in view of the usability in other Asian countries in the future, we decided to use this setting in the current study. This is because the virtual background that we adopted is comparatively easier to change into another background while the virtual shopping environment (Canty’s) is developed based on the Australian shopping mall that people from other countries may not be familiar with.
After obtaining the permission of a local shopping mall, the second co-first author and two programmers took photos of the mall and built the video background. Thereafter, the research team used E-Prime 3.0 to develop the stimuli. The filmed segments, including the video clips and ToM questions, were merged, and their quality was checked before data collection. Different from the original version, this adapted version did not involve any shopping tasks like the original version26. It is believed that the original VAMA involves a shopping task that requires significant executive function, thereby increasing participants’ cognitive load. However, the primary aim of this study was to assess mentalizing ability. To focus on this, we adopted a methodology similar to that of another widely used ToM assessment tools (e.g., MASC)34, using only ToM video clips and questions in the current study. This approach ensures that participants’ cognitive resources are devoted to their mentalizing abilities, rather than being distracted by tasks demanding executive functioning. The participants completed the task (i.e., watching the video clips and answering the ToM questions) on a laptop, which had the tool installed. To enhance the experimental experience and minimize disturbances from external noise, we provided all participants with over-ear headphones. These headphones are designed to fit snugly over the head and ears, offering superior sound isolation and noise-canceling capabilities. The intimate auditory environment allowed the participants to immerse themselves fully in the task at hand. Thereafter, to ensure the feasibility and effectiveness of the experiment, we conducted a pilot study in mainland China to collect preliminary data. Pilot data were obtained from 66 volunteers in Weifang (a prefecture-level city in China’s central Shandong province; n = 35, Mage = 20.45 years, SD = 0.86) and Beijing (the capital of China; n = 31, Mage = 22.97 years, SD = 1.25). This initial phase allowed us to test our experimental design, procedures, and materials in a real-world setting, providing valuable insights into potential challenges and areas for refinement. Although we did not perform data analysis on these pilot data sets, the process was instrumental in assessing the practicality of our approach and identifying necessary adjustments before proceeding to the full-scale study.
VAMA scoring
Following the original VAMA version, the participants’ responses were scored dichotomously: each “Accurate mentalizing” response was rated as 1, and others were rated as 0. In addition, “No ToM,” “Hypermentalizing,” and “Reduced ToM” were scored dichotomously to compare the frequencies of these error types. Scores for each subscale and error type ranged from 0 to 10, and the total accurate score ranged from 0 to 40.
Procedures
After providing informed consent, eligible participants were invited to complete the computer-based VAMA and the questionnaires via WenJuanXing (Changsha Ranxing Information Technology Co., Ltd., Hunan, China), which is a data science company that specializes in managing a personal information database containing details, such as age, gender, and residence, for over 2.6 million Chinese residents. The platform enables research teams to distribute questionnaires online and recruit specific participants based on chosen sampling methods. WenJuanXing has been utilized in many studies to gather representative samples and conduct cross-sectional online surveys41,42. To examine test–retest reliability, 30 participants were randomly selected and asked to complete the VAMA again approximately 4 weeks (i.e., ±1 day) after the first testing session.
Data analyses
IBM SPSS 29.0.0.0 was used to analyze the collected data. The item difficulty (by calculating the mean accuracy of each item) and item-total correlation for each item were calculated to examine the internal consistency of the adapted version of the VAMA, and Pearson’s correlation coefficients were calculated to examine its test–retest reliability. To evaluate construct validity, a 2 × 2 repeated-measures analysis of variance (ANOVA) was used to investigate the effects of Order (viz., first- vs. second-order ToM) and Type (viz., cognitive vs. affective ToM) on ToM performance. All data (de-identified and anonymous), analysis codes, and research materials are available upon request from the corresponding author.
Results
Mean scores, item difficulty, and item-total correlation
The descriptive statistics for each VAMA subscale are shown in Fig. 1 and Table 2. Item difficulty with the four subscales was as follows: first-order cognitive ToM = 0.40–0.84, first-order affective ToM = 0.24–0.89, second-order cognitive = 0.11–0.81, second-order affective = 0.13–0.65. The item-total correlations were statistically significant, with either pFDR < 0.05 or pFDR < 0.01. These results suggest that all of the items measured their respective subscales (see Table 2).

Mean response accuracy on the subscales of the VAMA. Error bars represent standard error.
Construct Validity
A significant main effect of Order was found (F(1, 99) = 39.04, p <0.001, ηp2 = 0.28), where the participants showed better performance in first-order ToM (M = 5.90, SD = 1.19) than in second-order ToM (M = 4.95, SD = 1.23). The results also revealed a significant main effect of Type (F(1, 99) = 20.76, p < 0.001, ηp2 = 0.17), where the participants performed better in cognitive ToM (M = 5.71, SD = 1.16) than in affective ToM (M = 5.14, SD = 1.11). The 2-way Order × Type interaction was not statistically significant (F(1, 99) = 0.008, p > 0.93).
Test–retest reliability
The adapted version of the VAMA was readministered to a randomly selected subsample of 30 individuals 4 weeks after the initial test. Moderate to strong positive correlations were found between the test–retest scores: first-order cognitive ToM: r = 0.568, pFDR < 0.01; first-order affective ToM: r = 0.326, pFDR > 0.05; second-order cognitive ToM: r = 0.296, pFDR > 0.05; second-order cognitive ToM: r = 0.380, pFDR < 0.05; and total score: r = 0.731, pFDR < 0.01 (see Table 3).
Study 2
Method
Participants
Sample 1: Schizophrenia patients and healthy matched controls
To examine the clinical utility of the adapted version of the VAMA, 39 schizophrenia patients (Mage = 29.13 years, SD = 9.11, age range = 18–51) were recruited from Peking University Sixth Hospital, and 37 healthy matched controls (Mage = 27.57 years, SD = 8.04, age range = 19–50) were recruited from the community in Beijing, China. This clinical group included 29 inpatients and 10 outpatients who met the diagnostic criteria for schizophrenia according to the Fifth Edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5)43 with no specified limit for illness duration. The exclusion criteria for all participants were those with an IQ below 70, a history of substance abuse or drug addiction, a record of brain trauma or neurological disorders, and a history of electroconvulsive therapy within the past 2 months. For the healthy matched control group, only those lacking psychiatric disorders and any family history of psychiatric or neurological disorders were recruited. The Chinese version of the Mini International Neuropsychiatric Interview (MINI)44 was used to screen individuals with psychiatric disorders for this group.
Table 1 shows the demographic information of the two groups. Nonsignificant between-group differences were found in gender and age (p > 0.05), while significant between-group differences were found in education years (p < 0.01) and estimated IQ scores (p < 0.001). This study was approved by the Ethics Committee of the Peking University Sixth Hospital (reference number: 2021–24). Written informed consent was obtained from all participants before data collection.
Sample 2: Individuals with high and low levels of SA
An extreme group design was adopted to explore whether the adapted version of the VAMA can detect ToM impairment in people with high levels of SA. The participants were divided into two groups: those with high and low SA scores. The initial pool of 13,709 individuals consisted of students who participated in a large-scale survey activity in Shandong, China, during which they completed the Revised SA Scale (CSAS)43. This pool of people was not specifically recruited for our experiment but served as “potential participants.” According to previous studies’ sample size (which also adopted the high and low SA group design)44,45, we randomly invited 174 individuals from this pool and only 126 of them were willing to participate in this study. Upon arrival for the study, participants were asked to complete the CSAS again to ensure that their scores still met the inclusion criteria for the high or low SA groups. The exclusion criteria were identical to those in Study 1. Only those who continued to meet these criteria were included in the final sample of 102 participants. In accordance with the criteria of a previous study on SA46, individuals were allocated into the high SA group (n = 48) if their CSAS score was higher than 1 standard deviation (SD = 6.67) equal to or above the mean score (M = 10, cut off score ≥17), whereas individuals whose CSAS score was equal to or lower than the mean score were allocated into the low SA group (n = 54, cut-off score ≤10). The high SA group had a mean age of 19.83 years (SD = 0.72) and consisted of 25 male and 23 female participants. The low SA group had a mean age of 19.81 years (SD = 0.70) and consisted of 22 male and 32 female participants (see Table 1). No significant between-group differences were found in gender, age, and education years (p > 0.05). This study was approved by the Ethics Committee of the Weifang Medical University (reference number: H20045). We obtained written informed consent from all participants before data collection. All participants received RMB110 after completing the whole study.
Measurements
VAMA task
The adapted version of the VAMA task, as described in Study 1, was also used in Study 2.
Clinical assessment
The severity of clinical symptoms in schizophrenia patients(sample 1) was assessed using the Positive and Negative Syndrome Scale (PANSS) developed by Kay et al.47. The Chinese version of the PANSS was used in the current study48. Estimated IQ was measured using the Wechsler Adult Intelligence Scale-Revised Chinese version (WAIS-RC)49.
Revised SA scale
Revised Social Anhedonia Scale (CSAS)50. The CSAS was administered to the participants (sample 2) to measure their hedonic experience. The CSAS assesses the reduction in the level of hedonic experience in interpersonal relationships and communications using 40 true–false items in clinical or nonclinical settings. Higher scores indicate more severe SA. The Chinese version of the CSAS has been used and validated in China and has shown good internal consistency (Cronbach’s alpha = 0.86)43.
Chinese version of the Yoni task
To examine the group difference between patients with schizophrenia and healthy matched controls (sample 1), another ToM assessment tool, the Chinese version of the computerized Yoni task51,52 was used. In total, 98 trials were conducted for this task, comprising 32 and 66 first-order and second-order ToM questions, respectively. The participants were asked to infer the thoughts (cognitive ToM), feelings (affective ToM), and position (physical condition, control condition) of “Yoni” (Chinese name is Xiaoming) according to his facial expression. The Chinese version has been applied to schizophrenia patients in previous studies53.
Procedures
A psychiatrist evaluated the eligibility of all participants in sample 1 after they had signed the informed consent form. The doctor also interviewed the healthy participants to rule out mental illness. This study comprised three parts: a semi-structured interview, including the PANSS (for the clinical group) or MINI (for healthy participants), self-reported surveys and experimental tasks (VAMA and Yoni). Finally, they were administered the two ToM tasks (VAMA and Yoni task) by trained research assistants, which took around 1.5–2 hrs. The participants in sample 2 were divided into the high- and low-level SA groups based on their CSAS scores. Eligible participants were then invited to complete the adapted version of the VAMA only. The entire process took 30 mins to 1 h.
Data analysis
Studies have reported associations between premorbid intelligence and ToM, in addition to significant correlations between the estimated IQ scores and the VAMA and Yoni task (see Table 4). Therefore, analyzes of covariance (ANCOVAs) were adopted to analyze the data. Two 2 (Schizophrenia vs. Healthy Matched Control group) × 2 (First- vs. Second-order ToM) × 2 (cognitive versus affective ToM) mixed ANCOVAs were then conducted for the VAMA and Yoni task. Independent samples t-tests were also conducted to examine between-group differences in the four subconstructs (for the VAMA and Yoni task) and three error types. As we proposed a directional hypothesis, we evaluated the one-tailed p-value rather than the two-tailed p-value54. For sample 2, similar data analyses were adopted for the adapted VAMA. Finally, Pearson’s correlations were calculated to examine the association between clinical symptoms of schizophrenia patients and the frequency of error types of the adapted version of the VAMA. All data, analysis code, and research materials are available upon request from the corresponding author.
Results
Sample 1: Schizophrenia versus healthy matched control group
VAMA Task
As expected, a significant main effect of Group was found (F(1, 73) = 37.23, p < 0.001, ηp2 = 0.34), where overall, the clinical group performed (M = 4.23, SD = 1.50) significantly poorer than the healthy matched control group (M = 5.33, SD = 1.55), as shown in Appendix C. Non-significant main effects of Order (F(1, 73) = 1.51, p > 0.05) and Types (F(1, 73) = 1.92, p > 0.05) were also found. Notably, a significant two-way interaction between Order × Type (F (1, 73) = 4.41, p < 0.05, ηp2 = 0.06) was observed. Particularly, in cognitive ToM, participants performed better in the first-order ToM (M = 5.31, SD = 1.47) than the second-order ToM (M = 4.49, SD = 1.68, F (1, 73) = 10.23, p = 0.002, and ηp2 = 0.12), while in affective ToM, participants also performed better in first-order (M = 4.94, SD = 1.50) than second order (M = 4.39, SD = 1.77, F (1, 73) = 5.24, p = 0.03, and ηp2 = 0.07). In both first- and second-order ToM, non-significant main effects were found between types (p > 0.05). The three-way interaction between Order × Type × Group (F (1, 73) = 0.50, p > 0.05) and the two-way interactions between Order × Group (F (1, 73) = 3.68, p > 0.05), and Type × Group (F (1, 73) = 0.01, p > 0.05) were not statistically significant.
Planned comparisons using t-tests were conducted to examine between-group differences in all four subconstructs of the adapted version of the VAMA. The clinical group showed significantly worse performance on all subconstructs [viz., first-order cognitive (t(74) = 5.60, p(one-tailed) < 0.001, d = 1.28), first-order affective (t(74) = 4.44, p(one-tailed) < 0.001, d = 1.02), second-order cognitive (t(74) = 2.92, p(one-tailed) = 0.002, d = 0.67), second-order affective ToM (t(74) = 2.54, p(one-tailed) = 0.007, d = 0.58; also see Appendix C and Appendix D]. In addition, patients in the schizophrenia group showed significantly more “hypermentalizing” errors (t(68.53) = 3.87, p(one-tailed) < 0.001, d = 0.88) and “no mentalizing” errors (t(74) = 3.57, p(one-tailed) < 0.001, d = 0.81) than did the healthy control group, while a non-significant between-group difference was found in “reduced ToM” errors (t(74) = 1.04, p (one-tailed) = 0.152); see Table 5).
Table 6 summarizes the correlations between clinical symptoms and mentalizing error types in patients with schizophrenia. A significant positive association was found between negative symptoms and “hypermentalizing” errors (r = 0.388, pFDR< 0.05), but none of the other symptom–error type associations were found to be statistically significant (p > 0.05).
Yoni task
The mixed ANCOVA results showed significant main effects and a significant interaction effect (also see Appendix E). First, a significant main effect of Order was found (F(1, 70) = 17.25, p < 0.001, ηp2 = 0.20), where the participants showed better performance in first-order ToM (M = 0.86, SD = 0.24) than in second-order ToM (M = 0.74, SD = 0.18).
A non-significant main effect of Type was also found (F(1, 70) = 1.08, p > 0.05). Surprisingly, non-significant main effect of the clinical group was observed on Yoni task performance (F(1, 70) = 3.03, p > 0.05). Similar to the VAMA results, a significant two-way interaction effect between Order × Type was found (F(1, 70) = 4.35, p < 0.05, ηp2 = 0.06); specifically, in cognitive ToM, the participants performed better in first-order ToM (M = 0.87, SD = 0.24) than in second-order ToM (M = 0.71, SD = 0.22, F(1, 70) = 57.36, p < 0.001, ηp2 = 0.45), while in affective ToM as well, the participants also performed better in first-order ToM (M = 0.86, SD = 0.24) than in second-order ToM (M = 0.77, SD = 0.17, F(1, 70) = 21.49, p < 0.001, ηp2 = 0.24). In second-order ToM, the participants performed better in affective ToM (M = 0.77, SD = 0.13) than in cognitive ToM (M = 0.71, SD = 0.13, F(1, 70) = 13.92, p < 0.001, ηp2 = 0.17), while non-significant simple main effects of Type were observed on first-order ToM. Similar to the VAMA results, the three-way interaction between Order × Type × Group (F(1, 70) = 0.31, p > 0.05) and the two-way interactions between Order × Group (F(1, 70) = 0.013, p > 0.05) and Type × Group (F(1, 70) = 0.00, p > 0.05) were not found to be statistically significant. In particular, four independent samples t-tests were performed to compare the four subconstructs of ToM between the two groups. The clinical group showed significantly worse performance in all of the subconstructs [viz., first-order cognitive (t(58.26) = 2.20, p(one-tailed) = 0.016, d = 0.51), first-order affective (t(55.86) = 2.07, p(one-tailed) < 0.02, d = 0.48), second-order cognitive (t(64.41) = 3.55, p(one-tailed) < 0.001, d = 0.83), and second-order affective ToM (t(61.76) = 4.32, p(one-tailed) < 0.001, d = 1.01; also see Appendix E] than the healthy matched control group.
Sample 2: High level versus low level of SA
VAMA Task
The mixed ANOVA results revealed two significant main effects (see Appendix F, Figs. 1 and 2). First, a main effect of order was observed (F(1, 100) = 20.28, p < 0.001, ηp2 = 0.17); specifically, the participants performed significantly better in first-order (M = 5.65, SD = 1.12) than in second-order ToM (M = 5.01, SD = 1.30). Second, a significant main effect of Type was also found (F(1, 100) = 14.26, p < 0.001, ηp2 = 0.13), where the participants performed significantly better in cognitive ToM (M = 5.58, SD = 1.25) than in affective ToM (M = 5.08, SD = 1.13). The main effect of Group (F(1, 100) = 0.88, p = 0.35) and the interaction effects of Order × Group (F(1, 100) = 2.40, p = 0.124), Order × Types (F(1, 100) = 0.04, p = 0.85), Types × Group (F(1, 100) = 1.28, p = 0.26), and Order × Types × Group (F(1, 100) = 0.00, p = 0.98) were not significantly different.
To examine whether the two groups differed in the four subconstructs of ToM, planned comparisons using t-tests were performed. A significant between-group difference was found in second-order cognitive ToM (t(100) = 1.75, p(one-tailed) = 0.04, d = 0.35), where the participants with a high level of SA (M = 5.00, SD = 1.44) scored lower than their low SA counterparts (M = 5.56, SD = 1.72; see Table 7). Finally, non-significant differences were observed in the mentalizing error types between the two groups [“hypermentalizing”: t(100) = 0.68, p(one-tailed) = 0.25, d = 0.14; “reduced ToM”: t(100) = 0.98, p(one-tailed) = 0.16, d = 0.20; “no ToM”: t(100) = 0.41, p(one-tailed) = 0.34, d = 0.08); see Table 7].
General discussion
In the current study, we hypothesized that the adapted version would demonstrate psychometric properties comparable to the original Australian version26, including satisfactory test–retest reliability, the comparable mean accuracy of the four subconstructs, strong item-total correlations, neither floor nor ceiling effect would be observed, and participants would perform better in first-order than in second-order ToM and cognitive than affective ToM. Results from Study 1 provided evidence to support the adaptation of the VAMA for the Chinese culture, with acceptable psychometric properties. To further support the validity, we also hypothesized that schizophrenia patients and individuals with high levels of SA would show poorer performance in VAMA and make more ToM errors. In addition, we also expected that associations between clinical symptoms and the error types of VAMA in schizophrenia patients would be found. Results from Study 2 demonstrated that schizophrenia patients showed impairment on all four subconstructs of ToM, and they were found to make more hypermentalizing and no mentalizing ToM errors. On the other hand, individuals with high levels of SA, only showed impairment in one of the subconstructs of ToM (viz., second-order-cognitive ToM) and these individuals did not make more ToM error types compared to low levels of SA.
In light of the findings from Study 1, the adapted Chinese version of the VAMA is somehow similar in psychometric properties to the original Australian version26 (See Appendix G in Supplementary materials). We removed some scenarios that appeared more specific for the Western culture and added some new scenarios and items that were appropriate for the Chinese context, and still found acceptable internal consistency for this adapted version. Consistent with our hypothesis, the main effect of Order and Types were observed from the ANOVA results, where first-order ToM was found to be easier than second-order ToM, and cognitive ToM was easier than affective ToM. These results are consistent with previous studies11,26, supporting the construct validity of this adapted version of VAMA. However, it should be noted that the test–retest reliability for the four subconstructs requires further studies. Nevertheless, it should be pointed out that the test-retest reliability findings were similar to those of other ToM tools, like The Awareness of Social Inferences Test (TASIT, test–retest: 0.534)55, Hinting Task (rtest–retest: 0.509)18, and Social Attribution Task—Multiple Choice (SAT-MC, rtest–retest: 0.554)56,57,58. Although most of the ToM tools were found to have adequate to satisfactory known-group validity, their internal consistency and test-retest reliability in the non-clinical populations were not “satisfactory” as suggested by the classical test theory59. The possible reason is that the complex format of the constructs reduces their reliability58. Thus, it is not surprising that the second-order ToM in our current study, requires a higher and more complex reasoning skills, has a relatively low reliability when compared to the simpler first-order ToM. Another reason may be that the test was constructed with items of diverse levels of difficulty (two/three/four characters in a scenario), which could have contributed to reduced internal consistency60. Previous story-based, film-based studies also showed a similar level of internal consistency, i.e., the “Strange Stories Film Task”(Cronbach’s alpha = 0.58 (Intention), 0.42 (Mental state language question), 0.73 (Interaction question)60, and VAMA (First-order cognitive: Cronbach’s alpha: 0.69)26.
In Study 2, the adapted version of VAMA revealed ToM impairments in clinical and subclinical populations. In general, schizophrenia patients performed significantly worse than the healthy controls on the VAMA (both total and subconstruct scores) and the Yoni task (subconstruct scores), which is in line with our expectations10,11,12,13. ToM impairment in schizophrenia patients could be explained by the deactivation of different brain regions. A recent systematic review61 suggested that, compared to healthy controls, schizophrenia patients demonstrated deactivation of brain areas during ToM tasks, such as the inferior parietal gyrus, anterior cingulate cortex, and posterior cingulate cortex, impacting their emotional processing, empathy, and executive functions. Moreover, unlike controls, the patients showed no activation of the cerebellum and thalamus62. In addition to neurobiological deficits, childhood adversity, and emotional neglect have been identified as significant predictors of ToM impairment in schizophrenia patients54,63,64. The attachment-developmental-cognitive hypothesis suggests that key psychological features of schizophrenia, such as ToM deficits and negative self-and-other attributions, are attributable to disruptions or impairments in attachment relationships during the first decade of life65. Consequently, it is plausible that the psychological mechanisms underlying ToM deficits in schizophrenia are related to factors associated with disrupted attachments, such as reduced communication quality and insufficient emotional nurturance in early emotional neglect64. Additionally, other studies54,63 have reported adverse childhood experiences impact ToM impairments by examining the association between childhood trauma and brain function during ToM tasks in schizophrenia patients. The findings underscore the critical role of early stress in shaping ToM-related brain networks differently between healthy controls and schizophrenia patients, partially explaining the clinical and behavioral outcomes of the disorder. However, the current study did not test the roles of childhood adversity, further examination is needed. Taken together, our results confirm that, similar to the Western counterparts, Chinese schizophrenia patients show clear impairments in all four subconstructs of ToM66.
However, some results from the Yoni task in the current study were unexpected and inconsistent with previous studies. The overall score of schizophrenia patients was not significantly poorer than healthy controls after controlling the IQ. This could be due to the task complexity. Compared to VAMA, the Yoni task focuses on basic ToM skills and incorporates visual stimuli, such as simple cartoon faces, eye-gaze direction, and the mouth shape of the character67. In contrast, VAMA involves more facial perception, emotion recognition, and social information processing which may better discern differences in ToM performance between healthy controls and schizophrenia patients. Consequently, this result demonstrated that VAMA is more sensitive in identifying the ToM impairment among schizophrenia patients than the Yoni task.
Another highlight of the current study is the associations between clinical symptoms and error types. In particular, negative symptoms showed a significant correlation with “hypermentalizing” errors in schizophrenia patients, which is consistent with the results of previous studies14,68. Pelletier–Baldelli68 proposed a model suggesting that disruptions in lower-level processes can have a “bottom-up” impact on social cognition and behavior. This model posits that specific changes in sensorimotor processes and social cognitive functions influence certain negative symptoms more than others. For instance, abnormalities in the parietofrontal cortical pathway, which is involved in monitoring personal space, could affect both the expression and interpretation of social cues. This disruption might negatively impact mentalisation abilities and, consequently, social motivation, while having minimal effect on speech production or facial expressions. Additionally, abnormalities in nearby parietofrontal circuits involved in eye movements and gestures may lead to different behavioral outcomes, contributing to various negative symptoms. Therefore, we speculated that the sensorimotor experiences of schizophrenia patients might heighten their propensity for overthinking or affect their interpretation of other’s mental states from social cues. Future studies could examine other subclinical groups, such as those with avolition or alogia68,69.
Notably, non-significant associations were observed between positive symptoms and other error types, which differs from previous studies’ findings4,22. This difference may be explained by cultural differences. Beck et al.70 studied cross-cultural differences in ToM impairment between schizophrenia patients from China and Denmark, suggesting that the complexity of ToM tasks, such as those involving multiple individuals discussing the mental states of a third person, is likely to be influenced by cultural factors, particularly in complex scenarios where language aids in developing the cognitive representations necessary for such reasoning71,72. Notably, language abilities encompass semantics, syntax, and pragmatics, with pragmatic skills being essential for the appropriate use and interpretation of language in communicative contexts, typically evaluated through measures derived from naturalistic conversations72. Similarly, Quesque et al. also found that differences between countries accounted for more than 20% of the variance in the results of the Faux Pas Test73 and the Facial Affect Recognition Test74. Their findings highlighted notable distinctions between Chinese and Western participants, suggesting cultural variations in perspective-taking. Chinese individuals often prioritize understanding others’ perspectives over self-related ones, unlike Western participants. Our results provide new insights into how the clinical symptoms of Chinese schizophrenia patients relate to error types, particularly negative symptoms, but not positive symptoms. Additional research is needed to validate these findings.
The performance of individuals with high levels of SA in the subconstructs of ToM was lower than that of individuals with low levels of SA. Although this suggests that SA affects ToM, it should be noted that statistically significant between-group difference was only found in second-order cognitive ToM. A study conducted by Lui et al.75 indicated that unaffected relatives of schizophrenia patients exhibiting negative schizotypal features were impaired in several cognitive abilities, such as visual—spatial memory, working memory, and verbal fluency. The VAMA task engages multiple cognitive functions—including memory, executive functioning, and social cognition where these cognitive processes are closely linked to ToM ability, especially higher-order ToM12. Hence, individuals with a high level of SA may also have some difficulties with mentalizing (i.e., second-order cognitive ToM) when a higher level of reasoning skills is required, due to difficulties in general higher-order cognitive abilities. This suggests that second-order cognitive ToM is more sensitive in identifying ToM impairment among at-risk individuals. This result is in line with another recent systematic review76, which also demonstrated the association between ToM abnormalities and negative schizotypy (Cohen’s d = −0.15). Although this result differed from those studies that found significant deficits in ToM and/or emotion recognition performance among the SA extreme groups16,77,78, this could be related to our small sample size in Study 2 and the different measurements used in previous studies. Future studies with larger sample size are needed to replicate this study and better understand ToM impairment in this subclinical population.
This study has some limitations. First, confirmatory factor analysis (CFA) was not performed. Due to the small sample size of healthy young adults (n = 100), we were unable to meet the minimum sample requirement for conducting the CFA (i.e., typically five to ten participants per item)79. Further research is needed to validate this version of VAMA. Second, the sample size of at-risk individuals (viz., those with high levels of SA) was small. Given their more subtle and less severe social cognitive impairments, a larger sample size is needed to clarify the nature of ToM impairment in this group. Third, only two groups of schizophrenia patients were included in this study. Future studies should include other groups (e.g., individuals with positive schizotypy, first-degree relatives, and ultra-high-risk individuals) to provide a more comprehensive picture of the extent of ToM impairments in schizophrenia spectrum disorders. Third, although the ToM impairments found in this study are likely to be related to deficiencies in the underlying neural mechanisms in schizophrenia patients and those with high levels of SA, our study only collected and reported behavioral data. To understand the neural mechanisms underlying ToM impairments in our participants, neuroimaging techniques, such as fMRI or fNIRS, are needed. Fourth is the exclusion of participants with a history of substance abuse or drug addiction. While this exclusion criterion was implemented to enhance the internal validity of the study by reducing confounding variables, it has the potential to inadvertently limit the generalizability of our findings. By excluding these individuals, our sample may probably not fully represent the broader population of people with psychotic disorders. Future research should consider including participants with a history of substance abuse to capture the diversity and to refine the applicability of VAMA. Fifth, the current study did not collect convergent validity measures as in Canty et al.’s study26. Instead, we focus on examining the theory consistent known group difference as criterion validity, demonstrating that schizophrenia patients performed worse than healthy controls in VAMA. However, this approach did not fully rule out alternative explanations for the group difference, such as general cognitive deficits in schizophrenia. Future research should incorporate additional measures to establish the convergent validity of the adapted version VAMA. Last but not least, ToM deficits are a transdiagnostic issue beyond schizophrenia. More research is needed to understand how these deficits manifest across different clinical disorders.
Notably, the study has some implications. The adapted version of VAMA demonstrates potential as a tool for distinguishing between individuals with high SA and those with schizophrenia based on ToM impairments. In our study, schizophrenia patients exhibited ToM impairments across all subconstructs of VAMA, while individuals with high SA showed differences in only one subconstruct. This pattern suggests that VAMA can effectively capture the broader ToM deficits characteristic of schizophrenia, while also detecting more subtle impairments associated with high SA. VAMA could assess at-risk individuals by comparing their scores to standardized mean scores derived from clinical and nonclinical populations. Such comparisons could facilitate the early identification of at-risk individuals can inform targeted interventions and support. The VAMA’s sensitivity to varying degrees of ToM impairment underscores its utility as a discriminative tool in both research and clinical practice. In addition, it is important to acknowledge the potential influence of dimensionality on the assessment of ToM constructs. As noted by previous researchers80,81, the novelty of employing VR and three-dimensional (3D) in therapeutic settings often highlights its potential motivational effects. However, the reported experiences with these technologies are mixed80. Future research examining the effects of 3D and 2D representations could provide valuable insights into how dimensionality impacts the user experience and cognitive engagement.
In conclusion, the newly adapted Chinese version of the VAMA is a reliable and ecologically valid tool. To the best of our knowledge, it is the only available tool in Chinese that can assess the four subconstructs of ToM and the three types of ToM errors in people with schizophrenia spectrum disorders. Understanding these specific impairments of ToM subconstructs is crucial as it can provide insights into the nuanced ways in which schizophrenia affects social cognition. Further research on the VAMA in other clinical or subclinical populations is needed to clarify the utility of the adapted VAMA in identifying at-risk individuals (i.e., psychosis or psychotic disorder) for early assessment and prevention.
Responses