Phenotypic divergence between individuals with self-reported autistic traits and clinically ascertained autism
Main
Online platforms such as Amazon Mechanical Turk and Prolific have become increasingly popular for data collection in human-participant research1,2. Through such platforms, researchers can rapidly collect data from hundreds or thousands of participants, allowing for better-powered and more-diverse samples than traditional laboratory data collection. Despite the many benefits of online research, there are also considerable concerns about the quality and validity of such data3. For example, previous research has reported low test–retest reliability, incoherent answers and inattention among online participants on multiple platforms3,4,5. For studies on psychiatric and neurodevelopmental disorders, current online research also relies primarily on self-report surveys to capture traits and diagnoses. Many of these conditions are characterized by altered insight and/or metacognitive awareness6,7,8, and many psychiatric surveys lack diagnostic specificity in the general population9,10. Therefore, self-report alone may not be the most accurate way to identify individuals with certain diagnoses or assess observable behavior in certain domains. Without the in-depth clinical characterizations afforded by lab-based research, there is nothing to compare self-report against, making it difficult to determine its generalizability and ecological or clinical validity.
Such concerns may be especially relevant when studying individuals with autism spectrum disorder (ASD), which is often associated with differences in social and self-insight. Previous investigations in ASD have highlighted discrepancies between self- and caregiver-reported autistic traits11,12,13 and co-occurring psychiatric conditions14. Although one study has identified significant correlations between self- and caregiver-reported autistic trait prominence, the combined measures were more predictive of independent living and employment outcomes than either measure alone, suggesting they do not provide redundant information15. In addition, a large-scale study found that self-reported traits among adults evaluated for ASD were not predictive of receiving a diagnosis16. Such findings suggest that foregoing the collection of outsider reports, as is often done in online research, may impede the contextualization of self-reported traits and limit the ability to predict clinical diagnoses and objective outcomes. While self-reported traits provide important information about subjective experience and well-being, they may not be an appropriate diagnostic shortcut.
In autistic individuals, discrepancies between self-reported and externally assessed traits may relate to core socioemotional symptoms, including differences in insight and theory of mind (ToM). Differences in ToM—or one’s ability to accurately infer the emotions, intentions and mental states of others—may make it difficult for some individuals with ASD to answer questions about others’ perceptions of themselves (for example, the prompt from the Broad Autism Phenotype Questionnaire (BAPQ): ‘People get frustrated by my unwillingness to bend’)17. This means that although self-reports may be a useful tool for understanding the internal and lived experiences of autistic individuals, they may not provide a complete, self-aware perspective of the person’s behaviors, communication or socioemotional sensitivity and effectiveness as perceived by others or in reference to societal standards.
It is also likely that self-reported traits do not always confer the same meaning in individuals with and without a clinical ASD diagnosis. Although subclinical autistic traits exist in the general population, psychometric investigations into self-report measures have suggested that equal scores in autistic and non-autistic individuals do not necessarily indicate equal levels of autistic traits18. In individuals without an ASD diagnosis, a high level of self-reported autistic traits may be more reflective of anxiety surrounding social situations19,20 and may have little effect on actual social functioning if they are able to compensate through other adaptive behaviors. By contrast, individuals with ASD may underreport their social difficulties due to impaired insight and ToM11,12,13. Thus, interpretations of self-reported traits in the general population may not be directly applicable to ASD.
In this study, we sought to systematically examine trait- and sociocognitive-level similarities and differences between adults with high autistic traits recruited online via self-report and adults with ASD defined via in-person clinical characterization. As social differences are a core feature of ASD, we chose to compare the online and in-person samples on their behavior during dynamic social interaction tasks. Specifically, we aimed to probe strategies for dynamically navigating changing social landscapes and exerting control over social environments, as autistic people show key differences in these areas that can hinder their satisfaction with social relationships and opportunities for employment. We hypothesized that individuals recruited online who self-reported high autistic traits would show distinct social interaction tendencies compared with a clinically defined in-person ASD sample. If true, such results would suggest that self-reported traits alone are not sufficient to identify adults with ASD in online studies.
Results
Autistic and other social traits
Participants were enrolled from an online pool consisting of ‘unselected’ adults from the community (via Prolific, n = 502) or had diagnoses of ASD confirmed and enrolled for participation in person (at the Seaver Autism Center in New York City; see Methods for details; n = 56). The online sample was further subdivided into ‘high-trait’ (n = 168) and ‘low-trait’ (n = 121) groups based on their total scores on the BAPQ17. From within each of these groups, 56 age- and sex/gender-matched participants were selected to match the in-person ASD sample. This resulted in three groups with 56 participants each: high-trait, low-trait and ASD. See Table 1 for demographic characteristics of each group. All comparisons were made on the basis of measurements taken from distinct samples.
As anticipated owing to how the groups were defined, the three groups differed in their self-reported autistic traits, as measured by BAPQ scores (F(2,163) = 232.86, P = 1.66 × 10–48, ηpartial2 = 0.74; Fig. 1a). This difference was driven by lower BAPQ scores (indicating fewer traits) in the low-trait group compared with both the high-trait group (t(110) = 18.88, P = 2.37 × 10–14, estimated difference = 1.74, 95% confidence interval (CI) [1.52, 1.95], Cohen’s d = 3.57) and the ASD group (t(112) = 18.51, P = 7.26 × 10–14, estimated difference = 1.71, 95% CI [1.49, 1.93], Cohen’s d = 3.52). The high-trait group and the ASD group did not differ in BAPQ scores (t(111) = −0.28, P = 0.957, estimated difference = −0.026, 95% CI [−0.25, 0.19], Cohen’s d = −0.05).

a, The ASD (n = 56 participants) and high-trait (HT; n = 56 participants) groups had comparable levels of self-reported autistic traits (measured via BAPQ; two-sided pairwise comparisons using estimated marginal means, with confidence intervals and P values adjusted for multiple comparisons using the Tukey method: t(111) = −0.28, P = 0.957, estimated difference = −0.026, 95% CI [−0.25, 0.19], Cohen’s d = −0.05; mean ASD: 3.82, mean HT: 3.85, mean low-trait (LT): 2.11). b,c, Investigation into traits of other disorders characterized by social impairment revealed that, compared with both other groups (n = 56 participants each), the high-trait group (n = 56 participants) self-reported a higher level of social anxiety (two-sided mixed-effects model with random intercept for matched pair ID: F(2,163) = 59.80, P = 3.33 × 10–20, ηpartial2 = 0.42; mean ASD: 35.39, mean HT: 46.43, mean LT: 19.21 (b)) and avoidant personality disorder (AVPD) symptoms (two-sided mixed-effects model with random intercept for matched pair ID: F(2,163) = 107.84, P = 1.46 × 10–30, ηpartial2 = 0.57; mean ASD: 20.09, mean HT: 23.80, mean LT: 11.36 (c)). d, In the in-person ASD group (n = 56 participants), there was no relationship between clinician-rated autistic traits measured via ADOS (mean = 13.85) and self-reported autistic traits measured via BAPQ (two-sided general linear model: b = 0.025, s.e.m. = 0.02, t(51) = 1.16, P = 0.251, 95% CI [−0.018, 0.067], ηpartial2 = 0.01). e,f, Broken down by subscales, there was no agreement in the restricted and repetitive behavior domain (RRB; general linear model: b = 0.12, s.e.m. = 0.06, t(51) = 1.95, P = 0.057, 95% CI [0.0, 1.0], ηpartial2 = 0.05 (e)) or the social domain (general linear model: b = 0.05, s.e.m. = 0.04, t(51) = 1.17, P = 0.249, 95% CI [0.0, 1.0], ηpartial2 = 0.03 (f)). Error bars represent the s.e.m.; gray shading around regression lines represents the 95% confidence interval. *P < 0.05; **P < 0.01; ***P < 0.001.
To evaluate the specificity of self-reported traits to ASD versus other disorders characterized by social differences, we also evaluated social anxiety and avoidant personality disorder (AVPD) symptoms. Although considered clinically distinct, social anxiety, AVPD and autism have overlapping features (such as withdrawal from social situations21,22,23,24) that frequently cluster together in investigations of transdiagnostic factors25,26,27. The groups differed in their social anxiety symptoms (F(2,163) = 59.80, P = 3.33 × 10–20, ηpartial2 = 0.42; Fig. 1b), such that the high-trait group had higher scores (indicating more symptoms) than both the low-trait group (t(110) = 10.87, P = 5.72 × 10–14, estimated difference = 27.3, 95% CI [21.4, 33.3], Cohen’s d = 2.06) and the ASD group (t(111) = −4.34, P = 9.18 × 10–5, estimated difference = −10.9, 95% CI [−16.9, −4.96], Cohen’s d = −0.82), and the ASD group had higher scores than the low-trait group (t(112) = 6.49, P = 7.33 × 10–9, estimated difference = 16.4, 95% CI [10.4, 22.4], Cohen’s d = 1.23). Finally, the groups differed in their AVPD traits (F(2,163) = 107.84, P = 1.46 × 10–30, ηpartial2 = 0.57; Fig. 1c). The pairwise group differences for AVPD traits follow the same pattern as social anxiety: the high-trait group had higher scores (indicating more symptoms) than both the low-trait group (t(110) = 14.58, P = 2.27 × 10–14, estimated difference = 12.50, 95% CI [10.42, 14.58], Cohen’s d = 2.70) and the ASD group (t(111) = −4.18, P = 1.73 × 10–4, estimated difference = −3.66, 95% CI [−5.74, −1.58], Cohen’s d = −0.79), and the ASD group had higher scores than the low-trait group (t(112) = 10.07, P = 0, estimated difference = 8.84, 95% CI [6.76, 10.93], Cohen’s d = 1.91).
In addition to the self-report measures, in-person participants completed the Autism Diagnostic Observation Schedule (ADOS-2; module 4)28, considered the ‘gold standard’ clinical assessment measure for ASD. Surprisingly, there was no significant relationship between self-reported ASD traits measured by BAPQ and those rated by clinicians using ADOS (b = 0.025, s.e.m. = 0.02, t(51) = 1.16, P = 0.251, 95% CI [0.0, 1.0], ηpartial2 = 0.01; Fig. 1d). Broken down by subdomain, there was also no relationship between self- and clinician-rated traits in the restricted and repetitive behavior domain (b = 0.12, s.e.m. = 0.06, t(51) = 1.95, P = 0.057, 95% CI [0.0, 1.0], ηpartial2 = 0.05; Fig. 1e) or the social domain (b = 0.05, s.e.m. = 0.04, t(51) = 1.17, P = 0.249, 95% CI [0.0, 1.0], ηpartial2 = 0.03; Fig. 1f). Such limited agreement between self- and clinician-rated assessments suggests that they may not be measuring the same features of ASD: whereas self-reported assessments can capture subjective internal experiences, clinician-rated assessments may capture external presentation of traits. Our results suggest that, in ASD, these two domains do not always agree.
Social behavior
As social differences are a hallmark feature of ASD, we chose to compare our groups on their behavior in two dynamic social interactions tasks. The paradigms outlined in the following allow for the quantification of complex social processes, including exertion of social control and navigation through ‘social space’.
Social controllability
Social controllability, or one’s ability to influence other people, is crucial for achieving optimal behavior during dynamic interactions and, subsequently, for mental wellbeing. To influence someone, one must first have a sense of that individual’s current and future thoughts, feelings and objectives. We hypothesized that impairments in ToM and social prediction may make it more difficult for individuals with ASD to exert social influence29,30,31,32,33.
To measure social controllability, we used a monetary exchange task26,27,34,35 modified from the ultimatum game, in which participants decide whether to accept or reject proposed splits of US$20 offered by players from two independent teams (Fig. 2a; see Methods for details). Unbeknown to participants and different from the traditional ultimatum game, participants have control over the offers proposed by one of the teams (‘controllable condition’). Specifically, participants can increase future offers by rejecting current ones or decrease future offers by accepting the current ones. At the end of the task, participants rate how much control they believed they had over players from each team (Methods).

a, As shown in the representative task screen, the social control task involved participants accepting or rejecting splits of $20 proposed by members of two virtual teams. b, Participants played the game with two different teams sequentially, the order of which was counterbalanced. With one of the teams (controllable condition), participants could increase future offers by rejecting the current one, or decrease future offers by accepting the current one. c, All groups (n = 56 participants each) showed comparable overall rejection rates for both conditions (two-sided mixed-effects model with random intercept for matched pair ID: F(2,281) = 0.77, P = 0.46, ηpartial2 = 0.006; mean ASD controllable: 52.4%, mean HT controllable: 55.5%, mean LT controllable: 54.7%, mean ASD uncontrollable: 49.6%, mean HT uncontrollable: 51.9%, mean LT uncontrollable: 48.2%). d, When rejection rate is broken down by offer size, we see that the ASD group (n = 56 participants) rejected a lower percentage of high offers than the two online groups (n = 56 participants each) during the controllable condition (two-sided mixed-effects model with random intercept for matched pair ID, P values false discovery rate (FDR)-corrected for multiple comparisons: F(122) = 6.12, P = 0.009, ηpartial2 = 0.09; mean ASD: 29.1%, mean HT: 45.8%, mean LT: 49.5% (left)). The groups (n = 56 participants each) did not differ in rejection rates in the uncontrollable condition (all two-sided mixed-effects models with random intercepts for matched pair IDs, P values FDR-corrected for multiple comparisons: low offers: F(2,111) = 1.19, P = 0.46, ηpartial2 = 0.02; medium offers: F(2,111) = 0.46, P = 0.71, ηpartial2 = 0.006; high offers: F(2,111) = 3.82, P = 0.075, ηpartial2 = 0.06 (right)). e, Unlike the online groups (n = 56 participants each), the ASD group (n = 56 participants) did not detect a difference in controllability between the conditions (two-sided mixed-effects models with random intercepts for matched pair IDs: F(2,269) = 18.52, P = 2.91 × 10–8, ηpartial2 = 0.12; mean ASD controllable: 45.47, mean HT controllable: 67.45, mean LT controllable: 61.07, mean ASD uncontrollable: 41.79, mean HT uncontrollable: 19.66, mean LT uncontrollable: 24.70). Error bars represent the s.e.m. *P < 0.05; **P < 0.01; ***P < 0.001. Panels adapted with permission from: a, ref. 27, Elsevier; b, ref. 35 under a Creative Commons license CC BY 4.0.
Because participants can raise offers only by rejecting a current proposal, we first sought to characterize their rejection rate both overall and as a function of offer size. We found that the three groups showed similar overall rejection rates during the task (F(2,281) = 0.77, P = 0.46, ηpartial2 = 0.006; Fig. 2c). Breaking rejection rate down by offer size, we found that, while the groups showed similar rejection rates for low offers (F(2,74) = 0.20, P = 0.82, ηpartial2 = 0.005) and medium offers (F(2,162) = 1.67, P = 0.29, ηpartial2 = 0.02), the ASD group rejected a smaller percentage of high offers (F(2,122) = 6.12, P = 0.009, ηpartial2 = 0.09; Fig. 2d) compared with both low-trait (t(92) = −3.34, P = 0.005, estimated difference = −0.20, 95% CI [−0.36, −0.05], Cohen’s d = −0.72) and high-trait (t(86) = −2.67, P = 0.025, estimated difference = −0.16, 95% CI [−0.30, −0.02], Cohen’s d = −0.57) online groups. Patterns of rejection rates did not differ across groups for the uncontrollable condition (low: F(2,111) = 1.19, P = 0.46, ηpartial2 = 0.02; medium: F(2,111) = 0.46, P = 0.71, ηpartial2 = 0.006; high: F(2,111) = 3.82, P = 0.075, ηpartial2 = 0.06), with each group showing the highest rejection rates for low offers ($1–$3) and the lowest rejection rates for high offers ($7–$9). Together, these results suggest that high-trait online participants behaved more similarly to the low-trait online group than to the clinical ASD group during controllable social interactions, whereas the clinical ASD group demonstrated distinctly reduced ability to exert control.
We next investigated whether participants differed in their subjective perception of the controllability that they had. Indeed, we detected a significant group-by-condition interaction on perceived control ratings (F(2,269) = 18.52, P = 2.91 × 10–8, ηpartial2 = 0.12; Fig. 2e). In the controllable condition, the ASD group perceived less control than both the high-trait group (t(110) = −3.79, P = 0.0007, estimated difference = −21.6, 95% CI [−35.2, −8.1], Cohen’s d = –0.73) and the low-trait group (t(111) = −2.64, P = 0.025, estimated difference = −15.1, 95% CI [−28.7, −1.5], Cohen’s d = −0.51); the high- and low-trait groups did not differ from each other (t(107) = 1.16, P = 0.48, estimated difference = 6.5, 95% CI [−6.8, 19.8], Cohen’s d = −0.22). In the uncontrollable condition, the ASD group reported having more control than both the high-trait group (t(110) = 4.46, P = 5.93 × 10–5, estimated difference = 21.8, 95% CI [10.2, 33.5], Cohen’s d = −0.86) and the low-trait group (t(111) = 3.38, P = 0.003, estimated difference = 16.6, 95% CI [4.9, 28.3], Cohen’s d = 0.65); the high- and low-trait groups once again did not differ from each other (t(107) = −1.08, P = 0.53, estimated difference = −5.2, 95% CI [−16.7, 6.3], Cohen’s d = −0.20). Such results suggest that, compared with both online groups, the clinically defined ASD sample was less accurate in their ability to detect changes in social controllability. In conjunction with the rejection rate result, these findings suggest that clinically confirmed adults with ASD, but not those defined solely by high autistic traits, showed altered ability to exert influence and perception of their controllability during social interactions.
Social navigation task
Successful navigation of social interactions requires the ability to dynamically update information about relationship dynamics and the ability to act in accordance with such information. For example, if during a conversation with a stranger, you were to learn that they are actually a close friend of one of your relatives, you may choose to be friendlier with them going forward. ASD is associated with social challenges that may affect the ability to accumulate and/or apply evidence about relationship dynamics and hinder adaptive navigation of social interactions.
To evaluate participants’ social feelings and actions during dynamic interactions, we utilized the social navigation task36. The social navigation task is a narrative-based game in which participants interact with a variety of virtual characters with the goal of finding a job and a place to live (Fig. 3a). The task consists of both story-building narrative trials and choice-point interaction trials. During interaction trials, participants choose between one of two ways to interact with a given character. Unbeknown to the participant, these choices reflect opposing changes in either the power or the affiliation dynamic between them and the characters. At the end of the narrative, participants are asked to rate the characters on how much they liked interacting with them (see Methods for further task details).

a, The social navigation task involved participants interacting with different characters with the goal of finding a job and a home. At each interaction, participants could choose between two options that affected either the affiliation or power dynamics of the relationship. Behind the scenes, each decision would move that character’s position accordingly in a social space framed by axes of power and affiliation. b, Compared with the low-trait group, the high-trait and ASD groups (n = 56 participants each) both reported a reduced liking of the characters in the social navigation task (two-sided mixed-effects models with random intercepts for matched pair IDs: F(2,111) = 8.11, P = 0.0005, ηpartial2 = 0.13; mean ASD: 51.09, mean HT: 51.98, mean LT: 59.10). c, Despite having comparable feelings toward characters, the ASD group (n = 56 participants) acted less affiliative than the high-trait group (n = 56 participants; two-sided mixed-effects models with random intercepts for matched pair IDs: F(2,111) = 17.21, P = 3.098 × 10–7, ηpartial2 = 0.24; mean ASD: 0.16, mean HT: 0.30, mean LT: 0.46). d, The groups (n = 56 participants each) did not differ in their power tendencies (two-sided mixed-effects models with random intercepts for matched pair IDs: F(2,163) = 1.89, P = 0.15, ηpartial2 = 0.02; mean ASD: 0.13, mean HT: 0.19, mean LT: 0.09). e, No group-by-trait interaction on character liking was detected (two-sided mixed-effects models with random intercepts for matched pair IDs: F(2,155) = 1.76, P = 0.18, ηpartial2 = 0.02). f, However, the relationship between affiliative behavior and self-reported traits differed by group (n = 56 participants each; two-sided mixed-effects models with random intercepts for matched pair IDs: F(2,160) = 3.42, P = 0.035, ηpartial2 = 0.04). Only the ASD group (n = 56 participants) showed a negative correlation between self-reported traits and affiliation tendency (two-sided Pearson’s correlation, P value FDR-corrected for multiple comparisons: r(54) = −0.38, P = 0.011). Error bars represent the s.e.m.; gray shading around regression lines represents the 95% confidence interval. *P < 0.05; **P < 0.01; ***P < 0.001. Panel a adapted with permission from ref. 37, Elsevier.
We began by investigating participants’ subjective feelings toward characters in the task and found that the three groups differed in their ratings of character likability (F(2,111) = 8.11, P = 0.0005, ηpartial2 = 0.13; Fig. 3b). Compared with the low-trait group, both the high-trait group (t(110) = −3.31, P = 0.004, estimated difference = −7.1, 95% CI [−12.2, −2.0], Cohen’s d = −0.63) and the ASD group (t(113) = −3.65, P = 0.001, estimated difference = −7.9, 95% CI [−13.0, −2.7], Cohen’s d = −0.69) self-reported reduced liking of characters. The high-trait and ASD groups did not differ in their character liking (t(111) = −0.36, P = 0.93, estimated difference = −0.77, 95% CI [−5.9, 4.3], Cohen’s d = −0.07)), suggesting comparable subjective experiences.
To explore how each group behaved in the task, we investigated group differences in power and affiliation tendencies, averaged across all characters. A significant three-group difference in affiliation tendency (F(2,111) = 17.21, P = 3.10 × 10–7, ηpartial2 = 0.24; Fig. 3c) revealed that the ASD group acted significantly less affiliative with the characters than both the high-trait group (t(111) = −2.63, P = 0.026, estimated difference = −0.13, 95% CI [−0.25, –0.01], Cohen’s d = −0.50) and the low-trait group (t(112) = −5.86, P = 1.45 × 10–7, estimated difference = −0.30, 95% CI [−0.42, −0.18], Cohen’s d = −1.11), indicating unique behavioral tendencies in the clinically defined sample. The high-trait group was also less affiliative than the low-trait group (t(110) = −3.25, P = 0.004, estimated difference = −0.16, 95% CI [−0.28, −0.04], Cohen’s d = −0.61). The groups did not differ in their power tendencies (F(2,163) = 1.89, P = 0.15, ηpartial2 = 0.02; Fig. 3d).
Last, we explored the relationship between social tendencies and self-reported autistic traits or subjective task ratings in each group. There was no group-by-trait interaction on character liking (F(2,155) = 1.76, P = 0.18, ηpartial2 = 0.02; Fig. 3e). Finally, there was a significant group-by-trait interaction on affiliation tendency (F(2,160) = 3.42, P = 0.035, ηpartial2 = 0.04; Fig. 3f). While the ASD group showed a negative correlation between self-reported traits and affiliation tendency (r(54) = −0.38, P = 0.011), there was no relationship in the high-trait (r(54) = −0.15, P = 0.27) or the low-trait (r(54) = 0.18, P = 0.27) group. Thus, while the relationship between subjective ratings and self-reported autistic traits did not differ by group membership, the relationship between objective actions and self-reported traits was specific to the clinical sample.
Self-reported diagnoses
All participants completed a questionnaire asking them to self-report whether they have received certain clinical diagnoses. While the in-person and high-trait online groups reported comparable rates of social anxiety and AVPD diagnoses, only two participants in the high-trait online group reported having ASD diagnoses (see Supplementary Table 3 for further details). In the entire online sample, only ten participants reported having ASD diagnoses. We conducted an additional analysis to evaluate how these ten participants (‘online ASD’) compared with our in-person ASD sample. We found that the social traits and behaviors of the in-person ASD sample more closely resembled those of the online ASD participants compared with individuals with high autistic traits (see Supplementary Table 4 for further details). Although the small sample size should be interpreted with caution, such findings may suggest that self-reported diagnoses provide more clinically relevant information than self-reported traits. As dimensional approaches continue to gain popularity, researchers may wish to re-emphasize the merits of collecting diagnostic information in conjunction with symptom levels.
Discussion
Here we sought to investigate the phenotypic similarities and differences between online participants with high self-reported autistic traits and those with an ASD diagnosis confirmed in person via clinician evaluation. We identified a lack of agreement between self-rated and clinician-assessed trait measures, highlighting the need for separate interpretations of each. When investigating each group’s social behavior, although high-trait and clinically ascertained participants had similar levels of self-reported autistic traits, we found that only individuals with confirmed ASD showed impairments in recognizing opportunities to exert social control and reduced affiliation in their interactions with virtual characters; by contrast, high-trait individuals identified online showed comparable social behaviors to low-trait individuals. These results provide a caution for future online research: when attempting to identify and draw overarching conclusions about certain diagnostic groups, self-reported symptom surveys alone may not be sufficient.
Despite the lack of identified measurement agreement in this study, we do not believe that these results suggest that self-report questionnaires are invalid for ASD research. On the contrary, they are important tools for understanding the subjective experiences, levels of internal distress or wellbeing and needs of people with ASD. In the context of well-characterized samples, self-reports are crucial to ensure that individuals with lived experience have a role in shaping the narrative surrounding them, as they can challenge baseless assumptions regarding the intentions or reasoning behind the behaviors of people with ASD. Rather than dismiss the importance of self-views, the results provide a caution for the use of self-report alone for defining or extrapolating about a diagnostic group as a whole.
The discrepancy we detected between self-reported BAPQ and clinician-rated ADOS scores in the in-person ASD group is consistent with previous reports using different measures11,12,13. Discrepancies between self- and observer-rated traits are not uncommon among individuals with altered introspection; they have been reported in a variety of conditions characterized by altered insight, including depression37 and schizophrenia38. Evidence suggests that insight differences in such conditions may be more pronounced in certain domains. Among individuals with schizophrenia, for example, those with reduced insight have been shown to over-report their levels of extroversion but accurately report other personality traits, suggesting insight may have an important role in the reporting of social tendencies specifically39. Reduced social self-awareness has been widely reported in ASD40,41 and likely contributes to discrepancies between self- and clinician report. It is possible that, despite presenting with mild social symptoms to the outside observer, autistic individuals with more social awareness may report greater social difficulties due to increased insight and anxiety regarding their social differences from typically developing peers40,42.
In the social controllability task, the ASD group rejected a smaller percentage of high offers in the controllable condition compared the online groups. This reduced rejection of ‘good’ offers hindered their ability to receive better offers down the line, suggesting they did not take advantage of the controllability offered by the condition. In line with this, we also saw that the ASD group did not self-report any differences in the perceived controllability of the conditions. Such results may stem from reductions in ToM-related understanding of others’ motivations in the clinical ASD group but not the high-trait group. To distinguish between random and non-random behaviors on the part of the players, participants must realize that the other players are motivated to receive the largest amount of money possible. To achieve this understanding, one might use prior information (that is, past offers) to build expectations about future behaviors (that is, players will give you repeatedly low offers as long as you continue to accept them) that would fit a given intention (that is, players want to maximize gain) and evaluate their accuracy. In ASD, impaired ability to predict offers and understand players’ intentions may lead to a lack of distinction between random and non-random (goal-directed) behavior. Indeed, individuals with ASD display a reduced understanding of social intentions—including whether actions are goal-directed43—that appears to stem from a reduced tendency to form expectations based on prior social information44. It is also possible that the reduced perception of controllability seen in ASD is caused by impaired affordance perception, which refers to the ability to ascertain which actions are available for you to take in a given environment. Autistic individuals have been shown to inaccurately estimate action capabilities in the perceptual–motor domain45, and such impairments are theorized to extend into the social domain46. In any case, the high-trait and low-trait online groups showed comparable behavior across all task measures, suggesting that this impaired detection of others’ goal-directed behaviors and/or perception of the actions available to oneself is specific to individuals with a confirmed ASD diagnosis.
In the social navigation task, although both the high-trait and ASD groups reported liking the characters less than the low-trait group, only the ASD group was less affiliative with characters during their interactions than other groups. Such results highlight the importance of measuring behavior for achieving a comprehensive understanding of trait presentation. The high-trait and ASD groups were aligned in their subjective beliefs, about both their traits and their opinions of others, but these beliefs did not translate into comparable social behaviors during the task. Considering that pro-affiliative behavior is often considered to be polite, and that individuals with ASD frequently exhibit lower adherence to social conventions31, this difference may be reflective of reduced awareness of or incentive to follow social pleasantry norms in ASD. By contrast, those without a confirmed diagnosis may be more inclined or better able to act friendly despite potentially neutral feelings or dislike of characters. In line with this idea, although we did not detect a group difference in the relationship between character liking and self-reported traits, we detected a distinct relationship between self-reported traits and affiliative behavior only within the ASD group—those with a higher level of traits were the least friendly with the characters. Such results provide further evidence that self-reported traits have different implications in individuals with and without a confirmed ASD diagnosis. Altogether, the findings from both tasks suggest that samples defined by online self-report are phenotypically distinct from clinically ascertained samples and that using such online samples to answer questions about social interaction may not be reflective of ASD as a whole.
In our study, the online group with high autistic traits also self-reported heightened levels of social anxiety and AVPD symptoms compared with the in-person ASD group. In addition, although our online high-trait and in-person ASD groups self-reported similar rates of social anxiety and AVPD diagnoses, less than 5% of individuals in the online high-trait group reported to be clinically diagnosed with ASD. Thus, it is possible that the online group represents individuals with both subclinical and clinical levels of socially avoidant and/or anxious traits without co-occurring ASD. Self-reported autistic traits in the general population, absent ASD diagnoses, may be reflective of generalized social avoidance and self-consciousness regarding social skills rather than autism-specific social difficulties. Supporting the existence of this phenotype, large-scale online studies investigating latent psychiatric factors in the general population have identified transdiagnostic dimensions characterized by similar socially avoidant/anxious traits25,26. As we have shown, online participants who report elevated internal perceptions of social difficulties (that is, self-reported emotional or cognitive struggles that others may not notice) as indexed by BAPQ scores also show different social behaviors from those with a clinical diagnosis who show elevated external difficulties (that is, socially inappropriate actions or visible struggles, as described by clinician report), suggesting the diagnosis and the dimension are not synonymous. Although social anxiety commonly co-occurs with ASD anxiety, it is still only represented in less than half of ASD cases47, and co-occurrence with AVPD is even less common48. It may be the case that self-reported traits lack diagnostic specificity, especially at subclinical thresholds, whereas clinicians are better able to assess symptoms rising to clinical relevance and to assign them the most parsimonious diagnoses through comprehensive analysis of both observed external behaviors and reported experiences.
An important implication of the distinction between diagnosis and traits is that we must be cautious not to extrapolate about the needs of one group on the basis of the findings from research conducted in the other. For example, despite doing reasonably well by external metrics of social abilities, individuals with high self-reported autistic, anxious and avoidant traits may need supports or intervention toward boosting self-confidence and reducing anxiety and negative self-talk rather than social skills training. By contrast, individuals who self-report few traits but present to clinicians with observable difficulties in social interaction may benefit from more skills-focused training to aid in quality-of-life outcomes such as independent living, relationships and employment. This distinction is important because, without it, there is a potential risk of harm (or at least reduced access to benefits) to individuals with ASD who require more behavioral support and access to accommodations; if online self-report-based samples are used to represent the whole diagnostic spectrum despite clear differences in behavior, implications for intervention and accommodation may be biased.
This study should be interpreted with the following limitations in mind. First, we relied on a single self-reported autism trait measure—the BAPQ—because of its strong psychometric properties in both the general population and in those with an ASD diagnosis17,49,50. However, other surveys such as the Autism-Spectrum Quotient51 are also commonly used in research to assess autistic traits, although we note that the Autism-Spectrum Quotient also does not always converge with clinical/caregiver impressions12,14,16, similar to the BAPQ–ADOS discrepancy identified in the current study. Second, as our study does not specifically measure insight, we cannot determine whether the discrepancy between self- and clinician-rated symptoms is directly related to insight differences in ASD. Future research should utilize insight paradigms to test this theory. Third, the range of IQ scores in our in-person sample is high, suggesting that our sample may not be representative of the spectrum as a whole. While fluent language capabilities were necessary for completion of our study procedures, future studies with less cognitively demanding tasks should extend the generalizability of this work by including adults with ASD who have lower cognitive functioning and/or higher support needs. In addition, we do not have evidence to examine whether the current findings are specific to ASD or generalizable to other psychiatric diagnoses such as schizophrenia or personality disorders where impaired insight can be a trait. Future research is needed to investigate the broader implications of this work.
Finally, since the inception of this study, Prolific has added a screening tool that allows researchers to specifically select participants that self-report having received a formal clinical diagnosis of ASD. Although this information still lacks the verification and context provided by a full clinical evaluation, the ability to specifically recruit individuals who report a clinical diagnosis would allow researchers to access a larger pool of participants who identify as autistic. Given our preliminary findings in the few individuals who self-report an autism diagnosis in our online sample (Supplementary Table 4), future work should investigate whether the use of additional trait measures and/or selection based on self-reported diagnoses in online studies would identify a group that shows behavior more closely aligned with the ASD phenotype. It is possible that other differences will become apparent; for example, individuals with high socially avoidant and anxious traits may be especially inclined toward participating in non-confrontational online studies rather than in-person studies involving face-to-face interactions, highlighting another important consideration when conducting online research. Future online studies recruiting individuals with self-reported diagnosis should also investigate potential platform differences in social profiles.
As online research continues to proliferate, we must consider the limitations of online approaches when determining which scientific questions they are best suited to answer. Questions about transdiagnostic traits and symptoms, for example, avoid the issues with diagnostic specificity in self-report and may be well suited for testing with online platforms, especially for traits not associated with impaired insight. Online research is a powerful tool that will continue to help answer important questions in human-participant research. However, the results of the current study suggest that online approaches in psychiatry should be used in tandem with—rather than as a replacement for—lab-based research, and that over-generalization of findings should be avoided in research relying on self-reported traits. For questions that require big data, researchers have other tools at their disposal: pooling resources, developing cross-site collaborations or utilizing resources such as Simons Foundation Powering Autism Research52 will allow for large-scale replications of lab-based studies in ASD that are less reliant on self-report.
Methods
Participants
The study was approved by the institutional review board at the Icahn School of Medicine at Mount Sinai, and all participants provided informed consent before participation. All participants were compensated for their participation. Baseline payment was US$20 per hour for in-person participants and an average of US$17.25 total for online participants. All participants were additionally paid a bonus based on the reward drawn from a random trial of the social control task. The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.
Online participants were enrolled in the study as part of a larger online project examining social cognition and mental health. Participants were recruited from Prolific (www.prolific.co), an online research participant recruitment site, with the eligibility criteria of (1) aged between 18 and 64, (2) currently living in the USA and (3) >90% approval rating in Prolific. Participants provided consent by clicking ‘I Consent’ after reading information about the study and were paid for their participation after completion, in accordance with policies on Prolific and at Mount Sinai’s School of Medicine. A total of 1,499 individuals attempted the initial study, which included the social controllability task (April 2020). From this, 14 participants were excluded due to duplication of their data files, and an additional 143 participants were excluded for flat behavior during the task (accepting or rejecting all offers). Of the initial push, 1,269 responded to a follow-up study containing relevant questionnaires (June–August 2020); 38 were excluded for questionnaire non-completion, 9 were excluded for exceeding the questionnaire time limit, and 47 were excluded due to missed attention checks or ID errors. This resulted in a total of 1,041 participants with usable questionnaire and social controllability task data. Also out of the initial push, 733 participants responded to a follow-up study to complete the social navigation task (April 2021–January 2022); 157 were excluded for either not having a plausible average decision response time (within ±2 s.d.) or having at- or below-chance post-task memory, resulting in 576 participants with complete social navigation task data. In total, 502 online participants completed all aspects of the study.
Over the course of approximately 3 years, 259 individuals were screened in person for inclusion/exclusion by clinical staff at the Seaver Autism Center for Research and Treatment at the Icahn School of Medicine at Mount Sinai (August 2021–June 2023). Participants were recruited through announcements posted on physical flyers around New York City and email listservs with the eligibility criteria of (1) age between 18 and 50, (2) meet criteria for ASD and (3) IQ > 70. To be inclusive, both those with past ASD diagnoses and those referred or self-referring with questions of autism were screened. Of the 259 participants screened, 171 did not meet criteria for ASD; 88 participants met clinical criteria and enrolled in the current study. Participants were screened for ASD by licensed, research-reliable clinicians using the ADOS-2, developmental and clinical history, self- and informant (for example, parent or roommate) report of symptoms and adaptive behavior, cognitive functioning and clinical judgment regarding whether the individual meets the criteria for ASD in the Diagnostic and Statistical Manual of Mental Disorders 5th edition60. Of the 88 initially enrolled in the study, 4 were excluded due to a loss to follow-up and/or unavailability to come into the lab. Of the 84 who attempted the tasks, 64 performed the tasks inside of the magnetic resonance imaging scanner to examine neural questions for an additional study, and 20 performed the tasks outside of the scanner on a laptop due to magnetic resonance imaging contraindications. Both groups were included in this study. To be included in the final sample for the social navigation task, participants had to respond on at least 75% of trials and have above chance post-task memory scores. To be included in the final sample for the social controllability task, participants could not have flat behavior (for example, accepted or rejected all offers in either condition). After exclusion, the final sample for the social navigation task was 71 participants, and the final sample for the social controllability task was 67 participants; 56 successfully completed both tasks without exclusion. The final sample of 56 participants included 18 (32.1%) adults who had received an ASD diagnosis before enrolling in our study, while the other 38 (67.9%) enrolled because they identified as autistic and received a first-time diagnosis through this study.
Measures
To assess levels of autistic traits in the sample, all participants completed the BAPQ. The BAPQ was selected due to its high sensitivity, specificity and test–retest reliability17,49,50. While originally designed to assess autistic traits in the non-autistic relatives of individuals with ASD, the BAPQ’s strong psychometric properties17,49 and lack of ceiling effects53 in individuals with ASD suggest that it performs well at identifying autistic characteristics in clinical populations50 as well as the general population. As a result, the BAPQ has evolved into a well-used general measure for autistic traits in populations both with and without ASD53. All participants completed additional questionnaires to investigate traits of other psychiatric diagnoses, including Liebowitz Social Anxiety Scale54 (avoidance questions) and the Avoidant Personality Disorder Impairment Scale55. The in-person participants additionally completed the ADOS-2 (module 4), a standard clinical assessment measure for ASD. Total algorithm raw scores for the ADOS were used in our analyses56.
To assess self-reported clinical diagnoses, all participants were asked whether they had ever been medically diagnosed with or hospitalized with a series of neurodevelopmental/psychiatric disorders. Space was also provided to write in any unlisted diagnoses.
To assess cognitive ability, all in-person participants completed either the Wechsler Abbreviated Scale of Intelligence 2nd edition57 or the Wechsler Intelligence Scale for Adults 4th edition58. To compare across participants, we used the full-scale composite scores from whichever test was available. All online participants completed the 16-item International Cognitive Ability Resource test59.
Grouping
The full online sample (n = 502) was subdivided into those who scored above the cut-off score on the BAPQ (high-trait, n = 168) and those who scored in the bottom 25% on the BAPQ (low-trait, n = 121). To minimize potential differences between in-person and online samples, we selected age- and sex-/gender-matched participants from within both high- and low-trait online groups to match the in-person ASD group. To do so, we developed a matching function in R (see data and code availability for a link to the function) that takes each individual in the in-person ASD group and attempts to find a nearest-neighbor match by (1) selecting all remaining, unmatched individuals from the high-trait and low-trait groups that match the participant on gender and (2) selecting the individual from within the newly created matched-gender groups that is closest in age. If there is no gender match (that is, for the non-binary individuals, of whom there were more in the in-person than online groups) or if gender information is missing, the function matches on sex assigned at birth. This resulted in three groups with 56 participants each: high-trait, low-trait and ASD.
In addition to matched samples, we also assessed differences between our in-person ASD group and the full, unmatched high-trait and low-trait groups. The results remain consistent (see Supplementary Table 2 for further details).
Experimental paradigms
Social controllability task
The social controllability task26,27,34 investigates how individuals exert control over others to maximize rewards. Participants were paired with virtual players from two 30-person teams, denoted by a town name (‘Aldertown’ and ‘Banyan Bay’) as well as a common color for the background and team members’ shirts. In each trial, the virtual partner proposed a way to split $20 (for example, $8 for you, $12 for them), and the participant had to decide whether to accept or reject the offer. If the participant chose to accept, both parties received the proposed amounts. If they chose to reject the proposal, neither party received any money. Each team represented a different condition: controllable or uncontrollable. Although the participants were told that they ‘may or may not have influence over this team’s offers’ for both teams, they were not explicitly instructed that they had control over only one team, or which team represented which condition. The order of the conditions was randomized across participants. Importantly, a previous study using this task showed clear differences when participants were instructed that they were ‘playing with a computer’ instead of ‘playing with virtual human partners’, suggesting the human version of the task successfully probes social-specific behaviors34.
In the controllable condition, participants could either increase the value of the next offer by rejecting the current offer or decrease the value of the next offer by accepting the current offer. The amount of the offer change was determined in a probabilistic manner: 1 in 3 chance of changing the offer by $2, 1 in 3 chance of changing the offer by $1, and 1 in 3 chance of no change. By contrast, in the uncontrollable condition, offer amounts were randomly sampled from a predetermined distribution (mean = $5.00, s.d. = $2.30), and the order of task conditions was randomized for each participant. In both conditions, the initial offer was $5, and the offers were constrained to be an integer between $1 and $9 (inclusive). At the end of the task, participants were asked to rate how much control they perceived they had over each team on a scale of 0–100%. The task was coded in Psychopy (in-person study, version 2021.1.4) and JavaScript (online study, version ES2019).
Social navigation task
The social navigation task36 is a narrative-based game in which participants interact with a variety of virtual characters. To adapt the original task for use in a clinical population and allow added check-ins as needed, the task was divided into four blocks of roughly equal length, following the natural cut points in the narrative (that is, transition into a new scene). At the start of the game, participants were told they had just moved to a new town and needed to find a job and a place to live. They were asked not to overthink their choices and to behave as they would in real life. The task consisted of narrative trials, which contained images of characters and narrative-progressing text, and decision trials, in which the participant had to choose between two ways of interacting with a given character. To select a choice, participants pressed key 1 or 2 on the computer keyboard or in-scanner button box. Although the task appeared to follow a ‘choose your own adventure’ style of dynamic storytelling, the slides were the same regardless of participants’ decisions. The slides that appeared after the decision trials were written to have narrative continuity regardless of the specific decisions that were made. To minimize the potential for internal biases influencing results, the race (light- versus dark-skinned) and gender (masculine versus feminine presenting) of the characters were counterbalanced (for in-person participants) or randomized (for online participants) across versions. After the task, participants completed a set of questions, including ratings of how much they liked the characters, as well as a set of memory questions to ensure attention during the task.
Unbeknown to the participant, each decision trial in the task probed choices in either the affiliation or power domain. Affiliation decisions included, for example, whether or not to share physical touch, physical space or information (for example, to share their thoughts on a topic). Power decisions included, for example, whether to submit to versus issue a directive/command, or otherwise exert versus give control. Each option would lead to changes in opposing directions, coded as either +1 or −1 depending on whether it was pro- or anti-affiliative for the affiliation trials, or gave power to the character versus took power away from the character for the power trials. The order of the options within a decision trial was counterbalanced across participants. Over the course of the narrative, participants interacted with five different characters holding a variety of social roles, each with six affiliation decisions and six power decisions, for a total of 60 decisions. There was also a neutral character with three neutral decisions that did not change their social location; these trials were not included in these analyses.
Behind the scenes, participants’ choices during the decision trials moved the characters’ positions within a two-dimensional social space framed by axes of power and affiliation. Each character started at the origin, with neutral affiliation and power (0,0). With each decision, that character’s coordinates were updated in the positive or negative direction along the current dimension. If, for example, the participant chose the pro-affiliative option in an affiliation decision trial, that character would move one unit in the positive direction on the affiliation axis. Thus, at any point in the task, the characters’ two-dimensional coordinates were the cumulative sums of the participant’s affiliation and power decisions in those specific relationships. To get summary measures of participants’ social tendencies, we calculated the means of their decisions in the power and affiliation domains separately for each character and then averaged across characters. The task was coded in Psychopy (in-person study, version 2021.1.4) and JavaScript (online study, version ES2019).
Statistics
For all analyses evaluating the relationships between continuous variables (task or trait), we utilized general linear models. For all analyses evaluating group differences in traits and task performance, or group-by-trait interaction effects on continuous variables, mixed-effects regression models containing a random intercept for each matched pair were conducted using the lme4 package in R. Degrees of freedom for these analyses were calculated using Satterthwaite’s method61 (the default in lme4). Post hoc pairwise comparisons parsing the direction of group effects were conducted using the emmeans package in R. Estimated marginal means were calculated for each group, and pairwise contrasts were performed with Tukey’s adjustment for multiple comparisons to control the family-wise error rate and the Kenward–Roger method62 to approximate degrees of freedom. Significant trait interactions were followed up by two-tailed Pearson correlations to parse the direction of effects in each group, with P values FDR-corrected for multiple comparisons. All trait and task performance variables were continuous; the group variable was categorical, with the in-person ASD group defined as the reference. All statistical tests in the study controlled for age and sex.
To confirm differences in self-reported autistic traits (as designed through group selection), we tested for differences in BAPQ scores across all three groups. As an exploratory follow-up to further characterize the groups, we also tested for differences in traits of other psychiatric disorders characterized by differences in social behavior: avoidant personality disorder and social anxiety. To investigate agreement between self-rated and clinician-rated autistic traits, we tested for relationships between BAPQ scores and ADOS scores in the in-person sample. As an exploratory follow-up, we also tested for relationships between corresponding subdomains (restricted and repetitive behaviors: BAPQ ‘rigid’ subscale and ADOS ‘restricted and repetitive behaviors’ subdomain; social behavior: averaged BAPQ ‘aloof’ and ‘pragmatic language’ subscales and ADOS ‘social affect’ subdomain). For the social controllability task, we investigated group-by-condition interactions on overall rejection rates and perceived control. To further investigate rejection rate patterns, we evaluated group differences in rejection rates for high ($7–$9), medium ($4–$6) and low ($1–$3) offers, with P values FDR-corrected for multiple comparisons within each condition. For the social navigation task, we investigated group differences in power and affiliation behavioral tendencies, as well as self-report ratings of how much they liked interacting with the characters. Finally, to investigate whether the relationship between traits (for example, BAPQ scores) and subjective task ratings/social behavior differed as a function of group, we investigated group-by-trait and group-by-rating interactions on social navigation task variables.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Responses