T-cell receptor structures and predictive models reveal comparable alpha and beta chain structural diversity despite differing genetic complexity

AdMaPlace March 9, 2025

0 Comments

Introduction

T-cell receptors (TCRs) are highly sequence diverse, somatically recombined receptor proteins. They govern the T-cell-mediated adaptive immune response by interacting with antigens such as peptides presented by the Major Histocompatibility Complex (pMHC)^1,2,3. The potential of TCRs as therapeutics for cancer and other diseases has led to substantial research effort dedicated to understanding and predicting their pMHC interactions and specificity^{4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20}. Despite the importance of this task²¹, TCR complementarity remains extremely difficult to predict beyond well-explored case studies, with existing computational models failing to generalise to unseen data^22,23. The development of more generalisable approaches was recently labelled a ‘Cancer Grand Challenge’ by the US National Institutes of Health²⁴.

Currently, most TCR:pMHC specificity predictors use sequence based features only^{4,5,6,7,8,9,10,11,12,13,14,15,16,17,18}. The incorporation of structural information offers a compelling strategy to encourage models to learn more generalisable physicochemical principles of complementarity^25,26,27. However, accessing and generating structural information for TCR:pMHC specificity prediction poses new challenges, which is perhaps why only a handful of structure-aware methods exist to date^20,25,28. One bottleneck is the low quantity of available TCR structure data and the high experimental cost of obtaining more: only 605 publicly available crystal structures deposited in the PDB²⁹ at the time this study was conducted contained TCR structures³⁰, compared to the over 20,000 epitope-labelled paired TCR sequences available in VDJdb³¹. However, advances in computational TCR structure prediction provide a cheaper and higher throughput alternative to experimentally solving TCR structures. While historically the accuracy of adaptive immune receptor structure prediction algorithms has been poor^32,33,34, recent progress applying deep learning methods has made faster and higher-accuracy in-silico predictions of TCR structures viable^35,36. Simultaneously, federated learning initiatives are increasing accessibility to highly valuable datasets from the Pharmaceutical industry, which could potentially significantly boost the amount of available data for model training^37,38.

A second limitation of many TCR:pMHC specificity predictors is their inclusion of only beta chain information of the TCR^{4,7,8,11,12,22,39,40}. TCRs function as heterodimers, typically consisting of an alpha and a beta chain whose recombination and sequence diversification process relies on the same genetic mechanisms underpinning the synthesis of the heavy and light chain of naïve B-cell receptors (BCRs). Briefly, sequence diversity results from the recombination of multiple genes (VDJ for beta/heavy chains, VJ for alpha/light chains) leading to six structurally proximal complementarity determining region (CDR) loops, three per chain³. While the CDR1 and CDR2 loops are relatively less sequence variable because they are templated by the V-genes, the CDR3 loops lie across the gene junctions giving rise to combinatorial diversity in their sequence. Further N- and P- nucleotide insertions and deletions at the junctions in the CDR3 lead to substantially greater length and sequence diversity⁴¹. Overall, it is understood that the VDJ-recombined beta/heavy chain CDR3 (CDR3β/CDR3H) loop exhibits the greatest sequence diversity and thus contributes more to determining target specificity than the corresponding VJ-recombined CDR3 of the alpha/light chain (CDR3α/CDR3L)^3,42,43. However, while there is strong evidence that this holds for BCRs/antibodies, where the CDR3H loop appears to contribute disproportionately to antigen specificity relative to the CDR3L loop^44,45, it remains unclear whether the TCR CDR3β is more deterministic of antigen recognition than the CDR3α⁴⁶, implying that the exclusion of the alpha chain could lead to the loss of important information for specificity prediction.

Here, leveraging a proprietary dataset of TCR crystal structures solved by Immunocore to supplement the publicly available data, we evaluated the current headline performance of computational TCR modelling and identified which CDRs remain challenging to predict, representing barriers to the utility of structural models for predicting properties of TCRs such as pMHC specificity. We first clustered the complete structural dataset to identify relevant biases, which unexpectedly revealed that the CDR3α and CDR3β loops exhibit similar structural diversity, despite their differing recombination mechanisms.

We then built TCRBuilder2+ by retraining TCRBuilder2³⁶, a best-in-class TCR specific structure predictor, on the expanded dataset of structures. The performance of TCRBuilder2+ improved for genes better sampled in the new training set, and TCRBuilder2+ achieves comparable overall accuracy to Alphafold Multimer at a fraction of the computational cost. Consistent with the clustering analysis, both TCRBuilder2+ and Alphafold Multimer found the VJ-recombined CDR3α loop at least as challenging to model as the corresponding VDJ-recombined CDR3β loop, differentiating TCR structure prediction from BCR/antibody structure prediction, where the CDR3L is substantially easier to predict than the CDR3H^36,47. In order to further explore the structural diversity of TCRs we used TCRBuilder2+ to predict the structures of over 1.5 million TCR sequences from the Observed T-cell receptor Space (OTS) database⁴⁸.

Finally, we investigated potential sources of the high structural diversity in the CDR3α. The amino acid composition and genetic coherence of distinct structures suggest that the greater number of TRAJ genes relative to both TRBJ and IGKJ/LJ genes, which leads to greater combinatorial diversity in the alpha chain, may also lead to enhanced structural variability.

Our study is the first to elucidate the surprisingly high structural complexity of the TCR CDR3α and the challenge of predicting its structure, and reinforces the value of paired-chain information for capturing the function of TCRs.

Results

Curating and analysing a supplemented dataset of solved TCR structures

The amount of experimentally determined TCR structure data in the public domain is significantly less than TCR sequence data³⁰, therefore harnessing structural features in general TCR property prediction pipelines requires the ability to make accurate, and ideally fast, predictions of TCR structures. In recent years, deep learning methods have emerged as state of the art for this task³⁶. However, all machine learning models are limited by the data available to train them on and there are countless examples of deep learning models that generalise poorly or that learn dataset biases rather than the underlying function governing the phenomenon^23,49,50,51.

In order to explore how increased structural data could improve TCR structure prediction, we supplemented the publicly available data with an internal repository of TCR structures solved by Immunocore, retraining our latest high-throughput TCR-specific deep learning model, TCRBuilder2³⁶ (Fig. 1a). As an initial step we analysed the novel dataset of TCR structures to identify any residual or newly introduced data biases and to enable fairer downstream model evaluation.

**Fig. 1: Overview of the architecture and training data distribution of TCRBuilder2+.**

The original training data for TCRBuilder2³⁶ contained 704 experimentally resolved TCR structures sourced from STCRDab^30,52. These structures were obtained from 463 distinct PDB files, with 349 of the 704 structures having a unique sequence. Our novel data contained 204 Immunocore TCR structures, of which 201 sequences are unique and only 6 overlap with the existing training set (Fig. 1b). This expansion contributed to a combined set of 908 structures, of which 544 are unique in sequence. Filtering for alpha-beta TCR pairs resulted in 881 TCR structures to train TCRBuilder2+, of which 524 are sequence non-redundant. Furthermore, since the original TCRBuilder2 was published more TCR structures have become publicly available, allowing for the curation of a larger and more robust test set. We generated a new, independent, and non-redundant test set of 45 alpha-beta TCR structures for downstream benchmarking (SI Table 1).

Figure 1c shows the distribution of TRAV and TRBV gene subgroups of the 881 TCR structures contained in our new training set, characterising the biases present in the data. We report the distribution at the level of individual genes in SI Fig. 1. There are strong biases towards a small number of genes and overall the joint distribution over pairs is sparsely sampled, with the majority of alpha/beta pairs unsampled or only appearing in the dataset once. The joint distribution is disproportionately weighted towards a small subset of gene pairs; for example, TRAV12-2 paired with TRBV6-5 and TRAV1-1 with TRBV6-1 account for 79 and 59 structures respectively, 15.7% of the training data.

Observations of naturally occurring alpha/beta chain pairings imply variable gene pairings are relatively uniform after accounting for gene abundance⁴⁸, indicating that it is the selection of TCR structures that have been experimentally resolved that is giving rise to the TRAV and TRBV gene bias, not the natural underlying TCR distribution. To disentangle the contributions of the STCRDab and the Immunocore datasets to the bias we also considered the distributions independently (SI Fig. 2). While we observed similar biases within each set, our new dataset improves the sampling of human TRAV-TRBV gene subgroups and gene pairs. We saw a similar trend of bias and improved coverage in our new dataset for TRAJ-TRBJ pairs, as well as VJ pairings, shown in SI Figs. 3–5.

Computationally predicting TCR structures from their sequences using data-driven methods such as deep learning or homology relies on there being a functional mapping from sequence to structure that can be extracted from the existing data. Therefore, we used the supplemented structural data to investigate the relationship between sequence and structure identity of most variable TCR loops, the CDR3α and CDR3β (Fig. 1d–e).

There is a disparity in trend between sequence identities of CDR3 loops greater than 65% and between 0% and 65%. At lower sequence identities, structural distances are weakly correlated (PCC −0.17 to −0.29). Higher sequence identities are moderately correlated with smaller structural distances (PCC −0.52 to −0.66), though, even between loops of identical sequence, substantial structural deviation can be observed: CDR3β and CDR3α loops with very high sequence identities (≥ 90%) can display structural dissimilarities of up to 4 Å (Fig. 1d, e, SI Fig. 6). We observed a similar trend when considering the pairwise LDDT distribution against sequence identity (SI Fig. 7). These results demonstrate the known limitations of homology approaches to TCR structure prediction, as there is often more complexity in what determines CDR3 structure than sequence identity of the loop alone. Deep learning approaches, such as the TCRBuilder2 architecture, have been shown empirically to be able to capture more subtle and complex patterns from data, yielding improvements in predictive capability^35,36,47.

The sequence identity to structural distance distributions in Fig. 1d and e also provide useful background distributions against which to evaluate the predictive performance of TCR structure prediction methods, since they provide an upper bound of the expected RMSD for a given prediction if a template with the closest sequence identity to the query sequence were selected. We observed a wide range of structural distances between CDR3 loops across all sequence identities, many of which are substantially greater than 2.0 Å.

Finally, comparing the pairwise RMSD distributions of the CDR3α and CDR3β loops reveals that the CDR3α loop (Fig. 1d) is as inherently structurally diverse as the CDR3β loop (Fig. 1e, SI Fig. 8). We confirmed this finding by clustering each of the CDR3 loop types by RMSD (SI Table 2). Surprisingly, despite their simpler genetic composition CDR3α loops do not cluster into fewer structural clusters than CDR3β loops; by contrast, the CDR3L loops of antibodies cluster into far fewer shapes than their CDR3H counterparts^53,54,55,56. This disparity between TCR and antibody structures hints that the structures adopted by CDR3α loops may be more complex than suggested by their genetic recombination mechanism alone.

Benchmarking state-of-the-art TCR structure prediction models

We retrained TCRBuilder2 on our supplemented dataset to generate a new model (‘TCRBuilder2+’), capturing the current performance ceiling of a TCR-specific deep learning model for high-throughput structure prediction. While the original TCRBuilder2 model followed from ABodyBuilder2³⁶ and selected the ensemble of trained models based on only the beta chain validation set accuracy, in this work we adapted the ensemble selection process to include the accuracy over both chains. This adaptation better reflects TCR biology by accounting for the high structural diversity we observed in the alpha chain.

We benchmarked TCRBuilder2+ against the original TCRBuilder2³⁶, TCR-specific homology modellers (LYRA³⁴, and TCRmodel³³), an antibody-specific deep learning model (ABodyBuilder2³⁶), and the state-of-the-art general protein structure predictor Alphafold (Alphafold-Multimer⁵⁷ and Alphafold3⁵⁸). We report the mean RMSD and the RMSD distribution of the 45 test set structures predicted by different models in Table 1 and Fig. 2a, respectively.

Table 1 Mean RMSD of TCR structure predictions by different prediction models evaluated on the test set

Full size table

**Fig. 2: Evaluation of TCR structure prediction models on our test set (SI Table 1).**

As expected, predicting TCR structures with ABodyBuilder2 yields low accuracy predictions across all TCR regions despite the global similarities between TCRs and antibodies, confirming that structure prediction models trained on protein family subsets learn to make accurate predictions within the distribution of the data they are trained with, rather than generalising beyond it⁵⁹. The homology model LYRA³⁴ only produces predictions for sequences for which a suitable template is found, and so we are only able to benchmark 38 of the 45 test structures. We provide a comparison for those 38 predictions only in SI Figure 9. Overall, homology modelling with LYRA and TCRmodel³³ performs competitively for the framework regions (FW) and the germline encoded CDR2α, CDR1β and CDR2β, but struggles with the CDR3α, CDR3β, and CDR1α loops (Table 1, Fig. 2a).

TCRBuilder2, TCRBuilder2+, Alphafold Multimer, and Alphafold3 all exhibit competitive performance across the whole test set. The difference in mean RMSD errors as reported in Table 1 is less than 0.25 Å across all regions, although Alphafold3 exhibits a slightly lower mean across most regions. Similarly, the span of median RMSD across all four deep learning models is very tight (Fig. 2a) with overlapping error bars. It is worth noting that Alphafold3 has a later training data cutoff, and it is therefore not possible to exclude test/train overlap inflating its performance slightly.

TCRBuilder2+ performs similarly to TCRBuilder2 (Table 1), suggesting that current bottlenecks in TCR structure prediction are not alleviated by our supplemented training dataset. We have reported the p-values expressing the statistical significance of the differences between the models’ performance (SI Table 3), and found that, with the exception of ABodyBuilder2 which significantly underperforms, it is not possible to distinguish the methods using a 5% threshold across most regions.

Based on the median RMSD, TCRBuilder2+ prediction accuracy is on par with TCRBuilder2 and is only slightly outperformed by Alphafold-Multimer (Fig. 2a). Examining how frequently each method makes a prediction of reasonable accuracy (<2 Å, see Methods), we found that the framework regions (FW) and germline-encoded loops (CDR1 and CDR2) are largely well predicted. As was observed for the homology model, CDR1α is markedly more difficult to predict than CDR1β, with mean TCRBuilder2+ prediction RMSDs of 1.25 ± 0.81 Å and 0.75 ± 0.41 Å respectively.

Meanwhile, the CDR3β and CDR3α loop predictions exhibit the largest errors, as is expected due to their greater sequence and structure diversity relative to the germline-encoded loops.

Specifically, TCRBuilder2+ predicts 34 (75.6%) CDR3β structures with sub-2 Å RMSD, more than TCRBuilder2, Alphafold Multimer, and Alphafold3 which predict 26 (57.8%), 32 (71.1%), and 33 (73.3%) within that threshold, respectively (SI Table 4). On the other hand, Alphafold Multimer is able to predict 27 (60.0%) of CDR3α structures within 2 Å RMSD, whereas TCRBuilder2+, TCRBuilder2, and Alphafold3 respectively retrieve 21 (46.6%), 23 (51.1%), and 25 (55.6%) CDR3α structures within that threshold (SI Table 4). Every model benchmarked (with the exception of ABodyBuilder2) predicts CDR3β loops with a higher mean accuracy than CDR3α loops (Table 1, Fig. 2a), once again contrary to the conjecture that the greater junctional diversity of CDR3β compared to CDR3α ought to lead to higher structural complexity.

Due to the biases towards specific gene pairings in the training set (Fig. 1c), we then analysed the CDR3α and CDR3β predictive accuracy over different V-genes in the test set (Fig. 2b, c). Contrasting the two TCR-specific models, we find that predictive accuracy improves for genes that are better represented in the new training set (Fig. 1c, SI Fig. 10, 11), with TCRBuilder2+ resulting in, on average, higher accuracy CDR3α predictions than TCRBuilder2 for human TRAV19 and -21 and CDR3β predictions for human TRBV6, -7, and -9. This suggests that more data for a specific gene helps predict the structures of TCRs that derive from that gene, but that this improved performance might not necessarily translate to all TCRs generally. This in turn implies that increasing the amount and improving the genetic sampling of experimentally solved structure data with which models can be trained will be necessary to improve general inference of TCR structures.

To assess whether TCR-specific and general structure predictors struggle to model the same TCR sequences, we compared the RMSD of the TCRBuilder2+ and Alphafold Multimer CDR3α and CDR3β loop predictions for every test structure (Fig. 2d–e). Points in the green quadrant or red quadrant indicate both methods are able to (green) or unable to (red) predict the structure to within 2 Å, while points in the purple quadrant or cyan quadrant indicate that Alphafold Multimer (purple) or TCRBuilder2+ (cyan) is uniquely able to model the loop to within 2 Å. We observe for CDR3α loops that a substantial number (13/45) fall into the red quadrant, indicating that neither TCRBuilder2+ nor Alphafold Multimer can accurately predict their conformation. In contrast, very few CDR3β loops fall into the red quadrant (2/45), reiterating that CDR3α structure is more challenging to predict regardless of the model used.

A hypothetical ensemble model able to perfectly identify whether the TCRBuilder2+ or Alphafold Multimer prediction is a better CDR3β or CDR3α loop model—independent of the other loop—would currently achieve 95.6% under 2 Å for CDR3β (up from 75.6% for TCRBuilder2+ alone) or 71.1% under 2 Å for CDR3α (up from 60.0% for Alphafold-Multimer alone). This suggests these models have captured different information, and that a method that can selectively incorporate knowledge of general protein loop conformations into TCR loop structure prediction may translate to larger performance boosts for CDR3β than CDR3α. Furthermore, the fact that even when combining the best predictions of both TCRBuilder2+ and Alphafold-Multimer the CDR3α structure is more elusive to predict emphasises its structural complexity despite its simpler genetic recombination.

As an indication of whether deep learning approaches have learned to model the TCR variable region globally rather than considering loops independently of one another, we investigated the correlation between CDR3α and CDR3β predictive accuracy (Fig. 2f). The Pearson’s correlation coefficients (PCC) calculated for both Alphafold Multimer (0.633) and TCRBuilder2+ (0.586) indicate a slight correlation between CDR3α and CDR3β prediction RMSD, suggesting that deep learning models consider loop conformations somewhat cooperatively. While comparing the RMSD of the CDR3α and CDR3β of each TCR in the test set, we found that 64.4% (29/45) and 60% (27/45) of TCRBuilder2+ and Alphafold Multimer’s predictions respectively fall below the RMSD identity line (Fig. 2f), confirming that for the majority of test structures the CDR3α is more difficult to model than the CDR3β of a given TCR, regardless which model is used.

Finally, we explored whether TCR structural models have yet attained a level of accuracy where useful information can be extracted from TCR:pMHC interfaces predicted using physics-based docking⁶⁰ and present the extended results of these experiments in the Supplementary Information (SI Note 1). We found that a total RMSD over both the CDR3α and CDR3β < 3 Å strongly supports the retrieval of an accurate docked pose (interface RMSD over all heavy atoms < 4 Å in all cases we ran simulations for, indicating that structure predictions are beginning to show merit for TCR:pMHC interface prediction. Our evaluation of current state of the art structure prediction models suggests a third of TCRs are modelled within this threshold (green region in Fig. 2f), indicating further improvements are required to enable consistent interface prediction via docking. It is not possible to identify a consistent accuracy threshold for docking when considering the CDR3α or CDR3β individually (SI Fig. 12–14), highlighting once again the importance of accurately modelling both regions for TCR:pMHC interface prediction.

In summary, retraining TCRBuilder2 with substantially more training data did not yield large gains in average predictive accuracy, although the mean RMSD for CDR3β reduced from 1.84 Å (TCRBuilder2) to 1.78 Å (TCRBuilder2+), narrowing the gap to Alphafold Multimer’s benchmark of 1.52 Å. We did observe improvements across TRAV and TRBV genes for which the abundance of training examples increased, suggesting that increasing the diversity of training examples by experimentally solving as yet unsampled genes should yield accuracy improvements. Physics-based docking studies indicate that, while predicted TCR structures are beginning to show utility for predicting TCR:pMHC interfaces, improvements in TCR structure prediction accuracy should also yield improvements in docking. Furthermore, and in line with our observations of experimentally solved TCR structures, we found that the CDR3α is consistently more challenging to predict than the CDR3β, with evaluations of all models suggesting that the genetically simpler CDR3 junction region, CDR3α, remains a key barrier to accurate TCR modelling.

Enabling structure-based deep learning research with 1.5 million TCR structure predictions

Next, we estimated to what degree the apparent diversity of alpha chain CDR structures is sampled by natural TCR repertoires. To assess this, we harnessed the computational efficiency of TCRBuilder2+ and predicted the structures of over 1.56M non-redundant TCR sequences from the newly released OTS database⁴⁸, yielding an approximately 2500-fold increase in TCR structures in the public domain. We then conducted a structural analysis, contrasting the TCRs in OTS to the BCRs/antibodies from OAS⁶¹. Our previous analysis of crystal structures suggested that the CDR3α of TCRs is as structurally diverse as the CDR3β; here, we observe that TCR CDR3α loops from T cell repertoire sequencing also exhibit this diversity.

Figure 3a and b show that the CDR1 and CDR2 loops of the TCR alpha and beta chains (a), and the antibody light and heavy chains (b) cluster into canonical forms, as has been shown to be the case for experimentally solved structures^62,63. Specifically, we find that for 1000 randomly sampled TCRs from our OTS structure predictions, the CDR1α, CDR2α, CDR1β, and CDR2β loops fall into 126, 200, 34, and 105 clusters, respectively, when applying a 1 Å RMSD threshold (SI Table 5). Repeating this clustering with 1000 antibodies from OAS, we find that the CDR1L, CDR2L, CDR1H, and CDR2H separate into 31, 9, 19, and 36 distinct loop conformations (SI Table 6).

**Fig. 3: Structure distributions of 1000 randomly sampled TCRBuilder2+ predictions from the OTS database⁴⁸ and 1000 randomly sampled ABodyBuilder2³⁶ predictions from the OAS database⁶¹.**

In agreement with our analysis of experimentally solved TCR structures, neither the CDR3α nor CDR3β of naturally occurring OTS TCRs cluster into distinct structures: 1000 random TCRs fall into 840 and 828 clusters for CDR3α and CDR3β respectively (Fig. 3c). As expected, the antibody CDR3H also does not cluster into distinct shapes (880 clusters per 1000 antibodies), but the CDR3L loop does (131 clusters per 1000 antibodies). Comparing the clustering of antibody CDR3L to TCR CDR3α structures exemplifies the increased structural diversity of the CDR3α despite both loop sequences being the product of only VJ recombination.

The pairwise RMSD distributions of the CDR3α and CDR3β of the OTS predictions are indistinguishable (Fig. 3d), confirming that TCRBuilder2+ generates equally diverse conformations of these regions when predicting sequences in OTS. On the other hand, the RMSD distribution of the antibody CDR3L is shifted towards lower RMSDs relative to CDR3H and exhibits the bimodality expected of loops sampled from distinct canonical shapes; this bimodality is not observed for the TCR CDR3α. These results are recapitulated if we consider loop length matched distributions of RMSD (SI Figs. 15–17).

Together, this analysis of predicted TCR and antibody structures demonstrates the difference in structural diversity of the VJ recombined CDR3α and CDR3L loops. This concurs with our finding from the analysis of TCR crystal structures and the evaluation of TCR prediction models that CDR3α structural complexity matches that of CDR3β.

Potential explanations for CDR3α structure diversity

Finally, we sought an explanation for the unexpectedly high structural diversity of CDR3α by investigating the CDR3α and CDR3β sequence distributions (Fig. 4). Despite displaying similar overall sequence diversity (SI Fig. 8), CDR3α sequences cluster by sequence identity, whereas the CDR3β sequences do not (Fig. 4a, b). Greedy clustering confirms this: fewer CDR3α than CDR3β clusters emerge for sequence thresholds between 50% and 100% (Fig. 4c, SI Table 9), suggesting that CDR3α sequences can be separated into sets of similar sequences.

**Fig. 4: CDR3α and CDR3β sequence diversity exhibit different length scales.**

To expand on this observation, we investigated the sequence logos of the CDR3α (Fig. 4d) and CDR3β (Fig. 4e), finding as expected that the central regions of both loops are highly diverse. However, the variability of the central CDR3β residues is greater than that of the central CDR3α residues, implying that diversity emerging from VDJ recombination with the additional junctions relative to the VJ recombination does increase the sequence diversity of the CDR3β relative to CDR3α. Both the clustering and the slightly lower variability of the CDR3α sequences contrasts with our observations of CDR3α and CDR3β structures, where CDR3α exhibits similar diversity to CDR3β.

Both the CDR3α and CDR3β loop sequences have high glycine and serine content (Fig. 4d and e). Following from this observation we investigated the proportion of residues that could be conducive to flexibility or the emergence of multiple conformations per sequence, which could provide a potential explanation for structural diversity. Figure 4f and g suggest that the proportion of small residues conducive to loop flexibility is approximately 50% for both the CDR3α (f) and CDR3β (g). Furthermore, studies of TCR structure^64,65,66 and in-silico simulations^66,67 have suggested that both the CDR3α and CDR3β structures may be flexible. Though it is difficult to analyse robustly computationally, we explored this hypothesis via a preliminary molecular dynamics case study of an apo TCR, the results of which we provide in the Supplementary Information (SI Note 2). Both TCRBuilder2+ and Alphafold Multimer produce an ensemble of structure predictions (SI Fig. 18, 20), enabling us to initialise the simulations from distinct predicted coordinates. In our case study we found that an array of conformations of the CDR3α and CDR3β loops predicted by TCRBuilder2+ and Alphafold Multimer appear to be physically plausible local energy minima of the simulations (SI Fig. 21). Therefore, while both the amino acid composition and the molecular dynamics simulations support the possibility that both CDR3α and CDR3β loops are flexible, it is not possible to distinguish CDR3α from CDR3β or suppose that one is more or less rigid than the other.

To further explore whether the emergence of multiple conformations drives similar structural diversity despite reduced sequence diversity of the CDR3α relative to the CDR3β, we investigated how deterministic TRA or TRB VJ gene pairing is of CDR3α or CDR3β loop structure, respectively (Fig. 4h). For all TRAV-TRAJ and TRBV-TRBJ combinations for which at least five structures were observed in the TCRBuilder2+ training data, we calculated how many distinct CDR3α and CDR3β structures existed, respectively, using a 2 Å RMSD threshold. We observed that 44% of TRAV-TRAJ pairs map onto a single CDR3α structure, while only 22% of TRBV-TRBJ pairs map onto a single CDR3β structure. This suggests, in agreement with the expected effect of the insertion of the diversity gene and the additional junction in the CDR3β loop, that the VJ gene pair is less deterministic of CDR3β than CDR3α structure. The high structural coherence of TRAV-TRAJ combination also suggests that CDR3α structures are relatively well-defined for a given gene pair.

However, the better clustering of the CDR3α sequences relative to the CDR3β and high coherence of CDR3α structure to TRAV-TRAJ gene pair raises the question where the equivalent CDR3α and CDR3β structural diversity we have observed originates from. Following our observation that the ratio between the number of CDR3α and CDR3β clusters is dependent on the sequence identity threshold, we investigated the combinatorial diversity of the CDR3α and CDR3β that arises from their genetic recombination mechanisms. As is well established³, we found that the number of potential TRAV-TRAJ pairings outnumbers those of TRBV-TRBJ (SI Tables 7, 8). Even when accounting for the diversity gene, the potential combinatorial diversity of the beta chain is only about a third of that of the alpha chain, primarily due to the relatively low number of TRBJ genes compared to TRAJ^3,68. In conjunction with our observation that, when clustering CDR3α and CDR3β sequences permissively (20–40% sequence identity), more sequence clusters emerge for CDR3α than CDR3β (Fig. 4c, SI Table 9), this suggests that CDR3α sequences may occupy tighter but more distinct regions in sequence space, as their primary diversification occurs at the level of genetic recombination. On the other hand, CDR3β sequences appear to exhibit greater variability at the level of individual amino acids, but are constrained by a smaller pool of genetic recombinations. It is therefore possible that CDR3α structural diversity is driven by the larger combinatorial pairing space provided by the larger number of functional TRAJ genes (51) relative to functional TRBJ genes (13).

Discussion

Incorporating structural information into TCR specificity prediction methods offers a compelling strategy towards alleviating their current limitations^22,23 by encouraging models to learn more generalisable physicochemical principles of complementarity in lieu of amino acid sequence motifs^25,26. Given the current sparsity of solved TCR structure data³⁰ and the expense of generating it, in-silico structure prediction is a helpful tool to provide estimates of the 3D properties of TCRs.

In this work, we have retrained a state-of-the art, high-throughput deep learning architecture for TCR structure prediction with the largest training set to date (TCRBuilder2+). We have made the resulting TCRBuilder2+ model parameters freely available (see Data Availability).

Evaluation of our model alongside Alphafold Multimer and Alphafold3 robustly characterises the current state-of-the-art of TCR structure prediction – all three models achieved median accuracies sub-2 Å RMSD across all regions, although the CDR3α and CDR3β loops remain discernibly harder to predict than the other loops. While performance was similar across the whole test set, on average TCRBuilder2+ showed CDR3 predictive performance improvements for genes for which the number of examples had increased in our new training set (Fig. 2b–c, SI Fig. 11). This aligns with related research exploring the generalisability of antibody structure prediction methods⁵⁹ as well as with more general work applying machine learning to biological data, which has shown that the quality, quantity, and sampling of the training data is critical for model performance^49,69,70. Furthermore, despite the large increase in non-redundant training data, we did not observe a statistically significant improvement in prediction accuracy across the whole test set compared to TCRBuilder2.

Supportive of the hypothesis that accurate TCR structure predictions may unlock additional information for downstream tasks such as specificity prediction²⁵, we found that the performance of docking simulations was linked to structure prediction accuracy (SI Note 1). We observed that a third of top-ranked TCR structure predictions exhibit sufficiently accurate CDR3α and CDR3β loops to enable docking within 4 Å interface RMSD (Fig. 2f), implying that, while an appreciable proportion of TCR structure predictions could already be useful for physics-based docking, there is still scope for improvements in prediction accuracy to benefit downstream tasks. Furthermore, disparities between poses with the best docking scores and the true best docks in our case studies suggest bespoke TCR:pMHC scoring functions may also be beneficial.

In agreement with prior findings, our simulation results suggest that the CDRs of TCRs may be flexible and may exhibit a degree of induced fit upon binding to the pMHC^64,65,66,67. This would imply that next-generation structure predictors would benefit from information about a partner antigen to generate more complementary TCR structures, and that a hypothetical, perfect structure predictor would require information about the antigen to predict an exact single prediction. Moreover, current structure prediction models are predominantly trained on TCRs in complex with antigen, since these are considerably more abundant in the existing data³⁰. If the antigen significantly influences the eventual bound structure of the TCR, then it is likely that current models will struggle to accurately predict apo TCR structures. While neither Alphafold Multimer nor TCRBuilder2+ were trained to predict multiple conformations and evaluating flexibility of the predictions is challenging, both are comprised of ensembles, and as such multiple distinct predictions for a single TCR sequence can be generated (SI Fig. 20), at the expense of computational resource.

The inference speed of TCRBuilder2+ (approximately 250x faster than Alphafold Multimer in our experiments (SI Table 10) enabled the prediction of over 1.5 million sequence non-redundant TCR structures from repertoire studies, the first dataset of its kind⁷¹. Structurally characterising repertoire data has been shown to yield clinically relevant insights for BCRs^72,73,74 and has enabled the training of deep learning architectures on structural data at previously inaccessible scales^75,76. Our release of the OTS structure predictions has the potential to enable similar advances in computational TCR research.

We found that both TCR-specific and general protein structure predictors modelled the VJ-recombined CDR3α loop with similar accuracy to the VDJ-recombined CDR3β loop. Our analysis of the landscape of solved TCR structures also found that the VJ-recombined junction in the alpha chain appears to be as structurally diverse as the VDJ-recombined junction of the beta chain, and suggests that CDR3α loop structures do not readily cluster into canonical forms. Prior works that assigned CDR3α loops to canonical structures presented evidence supportive of this finding, showing that CDR3α is not assigned to a cluster by sequence as consistently as CDR1α & 2α⁶², and that structure inference based on class assignment to a canonical loop yields less accurate results for CDR3α than CDR3β³⁴. For example, our 2019 paper⁶² shows that while 76.8% (43/(43+13)) of CDR1α loop structures and 72.0% (36/(36+14)) of CDR2α loop structures were assigned to a canonical cluster, only 22.4% (24/(24 + 83)) of CDR3α loops could be assigned. This contrasts strongly with the trends seen across B cell receptors and antibodies: despite analogous CDR3 diversification mechanisms, CDR3L loops predominantly fold into canonical structures, while CDR3H structures are substantially more structurally diverse^36,47,62.

One potential functional rationale for antibodies and TCRs to have evolved these different levels of structural diversity in their VJ junction is that, in antibodies, the CDR3L tends to plays a supportive role in binding, with the CDR3H driving antigen specificity and binding⁴⁵, whereas current evidence suggests that the TCR CDR3α and CDR3β play a more equal role in pMHC binding. For antibodies this has been substantiated through recent observations that the heavy chain clonotype is predictive of binding specificity but not the light chain clonotype⁴⁴ and metadynamics studies suggesting that CDR3H loops sample conformations in the space left unoccupied by the CDR3L loop and not vice versa⁷⁷.

While elucidating the precise rules of TCR to pMHC binding remains an active topic of research, structural analyses have shown that the CDR3α loop and the CDR1α loop, which is also difficult to predict despite being germline encoded (Table 1), frequently make direct peptide contact in TCR-pMHC interfaces^30,78,79, suggesting that additional information about the antigen or binding mode may be useful for attaining accurate structure predictions. Furthermore, unlike CDR3L loops relative to CDR3H loops, the CDR3α samples a similar length distribution to CDR3β⁷⁸ (SI Fig. 15). Given the frequent contact between the peptide and both the CDR3α and CDR3β, it would be coherent for evolutionary pressure to have increased CDR3α structural diversity alongside CDR3β diversity to achieve better peptide sensitivity.

Supportive of this hypothesis, the number of human TRA and TRB locus genes indicate that the potential combinatorial diversity of the alpha chain is greater than that of the beta chain, compensating somewhat for the presence of only one junction region rather than two (SI Tables 7, 8)⁶⁸. This appears to be driven primarily by the greater number of TRAJ genes (61) relative to both TRBJ (14) and IGLJ/KJ (11/5). Alpha chain combinatorial and structural diversity substantially exceeds that of the antibody light chain, which remains less structurally diverse despite undergoing additional somatic hypermutation diversification. Overall, this outsized diversity in the alpha chain VJ junction, coupled with relative data paucity, likely contributes substantially towards the CDR3α prediction errors being on-par with those of the CDR3β across all models tested. However, the high structural coherence of TRAJ-TRAV gene pairs and observed predictive improvements over better sampled genes suggests that supplementing existing TCR structure data with as yet unsolved crystal structures could yield substantial improvements in prediction accuracy.

TCRBuilder2+ allows TCR structures to be predicted at high-throughput with sub-2 Å median RMSD across all regions, but near-term improvements on these values may require fundamental methodological changes that better reflect TCR biology. Our analysis highlights the structural variability of the TCR CDR3α, on par with that of the CDR3β, differentiating T cells from their B cell/antibody cousins and reinforcing the value of paired-chain sequencing data and incorporating both chains in TCR:pMHC specificity models. Finally, our high-throughput model, TCRBuilder2+, trained on a supplemented dataset, enables the integration of TCR structure predictions into in-silico TCR screening and design pipelines, and, in combination with our release of ca. 1.5 million predicted structures of native TCRs, will help unlock the field of repertoire structural immunoinformatics.

Methods

Data curation

The original training data of TCRBuilder2 contained 704 experimentally resolved TCR structures sourced from STCRDab^30,36. These structures were obtained from 463 distinct PDB files, and 349 of the 704 structures are unique in sequence. We extended this training set with 204 Immunocore TCR structures, of which 201 sequences are unique and only 6 overlap with the existing training set.

Further filtering for alpha/beta TCRs yielded a total of 881 structures, of which 524 are unique in sequence. As a quality control we applied a resolution threshold of 3.5Å to the TCRs from STCRDab during training. We did not apply this threshold to the supplemented Immunocore data as all structures had a resolution of < 3.81Å (Table 2). Of the STCRDab training TCRs, 38.8 % are apo; 4.4% of Immunocore training TCRs are apo.

Table 2 Resolution statistics of training data

Full size table

TCRBuilder2 was evaluated on a relatively small test set of 21 structures⁵². Since the original work was published more TCR structures have become available, allowing for the curation of a larger and more robust test set. We updated the database of TCR structures, STCRDab³⁰, and generated a new, independent test set with the following criteria: TCRs must be paired alpha-beta chains, have lower than 3.5 Å resolution, be unique within the test set and be non-identical to a paired sequence in the training set. This process yielded 45 TCR structures, the PDB codes of which we have provided in the Supplementary Information (SI Table 1).

Gene annotation and numbering

We used ANARCI⁸⁰ with the IMGT gene set^68,81 retrieved 2023/09/04 to annotate and number TCR sequences.

TCRBuilder2+ model and training

We retrained the deep learning model TCRBuilder2³⁶ on our new dataset to create TCRBuilder2+. Like the original model, the neural network architecture of TCRBuilder2+ is based on the ‘structure module’ of Alphafold³⁵. The structure module is an SE(3)-invariant neural network architecture through which we pass the one-hot encoding of the TCR sequence as input. This encoding is then passed through eight sequential layers, each consisting of an invariant point attention module and a backbone update, which are used to update the translations and rotations of amino-acids represented as unconstrained planar triangles in Euclidian space. The model is trained via gradient descent with the RAdam optimiser⁸² using a variation of the Frame-Aligned Point Error (FAPE loss). We used early stopping on a withheld validation set to determine the point we consider the parameters to be sufficiently optimised. For further details please see the ImmuneBuilder methods section³⁶. However, because TCRBuilder2 followed from ABodyBuilder2, in the original work the model ensemble was selected to minimise the RMSD over the beta chain, whose genetic recombination is analogous to the heavy chain. We adapted the ensemble selection in TCRBuilder2+ to account for the structural diversity of the alpha chain by selecting the ensemble of models that minimises the RMSD over both the alpha and beta chain on the validation set. Specifically, ten models were independently trained, each with a partially randomised validation set. The validation sets each contained 30 TCRs which were held constant across all ten splits as well as 20 randomly selected TCRs. After training all ten models to convergence, we evaluated them on the constant subset of the validation set to select the four models with the lowest total RMSD over both the alpha and the beta chain. During inference we predicted four structures and selected the model closest to the ensembled average. The selected structure was then refined using the OpenMM implementation of the AMBER14 protein force field and returned as the final prediction^83,84. To produce an uncertainty estimate, we also calculated the sum of the squared distance of each model’s prediction x_i to the mean prediction μ: (epsilon =mathop{sum }_{i = 0}^{4}{({x}_{i}-mu )}^{2}). This provides an uncertainty proxy for the coordinates of every amino acid. To produce multiple conformations for both the molecular dynamics and docking simulations we returned all four models’ predictions, and minimised each of them with the AMBER14 force field.

TCR structure predictions

ImmuneBuilder

TCRBuilder2+, TCRBuilder2, and ABodyBuilder2 are all installable via the ImmuneBuilder package. We used the previously published versions of the weights for TCRBuilder2 and ABodyBuilder2, and the weights we trained as described above for TCRBuilder2+.

Alphafold Multimer

We used an instance of Alphafold Multimer⁵⁷ installed on our servers, and used version 2.3.1 of the weights. Due to constraints on the memory of the installation, we used the reduced version of the BFD database, as provided accompanying the Alphafold weights. We set a cut-off date of 2020/12/30 for templates, as this excludes all the test structures. However, it is possible that some of the test structures were included in the training data for Alphafold Multimer version 2.3.1, potentially giving the model a slight advantage.

Alphafold3

We used the Alphafold Server (alphafoldserver.com) accessed on September 10th, 11th and 12th 2024 (due to the 20 structure per day limit) to predict the TCR structures in our test set. It is possible that train test overlap is inflating the performance of Alphafold3 due to a training cut-off of 2023/01/12, and a template cut-off of 2022/09/28, which does not exclude our entire test set⁵⁸.

LYRA & TCRmodel

We installed LYRA³⁴ as part of the TCRpMHCmodel package⁸⁵ and ran inference withholding the target structure from the templates. We installed TCRmodel³³ with ROSETTA⁸⁶ and made predictions using the default settings.

Sequence numbering and gene annotation

We used IMGT numbering and gene annotations computed on amino acids by ANARCI^80,81. Table 3 shows the definitions of loops and anchors used. We refer to the CDR3 loops of the alpha and beta chain as CDR3α and CDR3β respectively.

Table 3 Definitions for anchors and loops based on IMGT numbering

Full size table

Calculating structural deviation

We used the root-mean-squared-distance (RMSD) as the evaluation metric for the predictions, which was calculated over the amino acid backbone atoms (N, C_α, C, and O). We aligned each CDR by the backbone atoms of the anchor residues and calculated RMSD over the backbone atoms of residues in the loops. The loop and anchor residue numbers are provided in Table 3, and were annotated using ANARCI⁸⁰ according to the IMGT numbering scheme⁸¹.

To compare CDR3 loop structures of different lengths, we implemented spline up-sampling, the parameters of which we set such that the distance calculated is RMSD equivalent when the loops are of the same length. Further details can be found in the Supplementary Information (SI Note 3).

An RMSD threshold for an acceptable TCR structure prediction was based on our observations of the sequence to structure identity distributions of CDR3α and CDR3β (Fig. 1d & e). Since RMSD will naturally be greater for longer CDR3 loops due to the increased degrees of freedom in the backbone, we also investigated the background distribution within each loop length (SI Fig. 6). This confirmed 2.0 Å as a sensible threshold across the range of TCR CDR3 lengths.

Statistical significance of RMSD distributions

We used the p-values of a two-tailed statistical test on the t-values to accept or reject the hypothesis that the distributions of RMSD of different models are different from one another. The null-hypothesis is therefore that the RMSD distribution of models is the same. We used the commonly defined t-score: (t=| {mu }_{1}-{mu }_{2}| /sqrt{({sigma }_{1}^{2}+{sigma }_{2}^{2})/N}), and used the survival function as implemented in scipy, with N − 1 degrees of freedom, to calculate the p-value, which we doubled, accounting for the test being two-tailed: p = 2*survival_function(t, N − 1). N = 45, the number of structures in the test set.

Calculating LDDT scores

We calculated the local-difference-distance test (LDDT) metric⁸⁷ using the implementation provided within the ‘compare-structures’ interface of the OpenStructure software package⁸⁸.

Sequence identity

We calculated the pairwise sequence identity between sequences with the following formula:

$$,{mbox{alignment}}(se{q}_{i},se{q}_{j})/max ({mbox{len}}(se{q}_{i}),{mbox{len}},(se{q}_{j}))$$

We used alignment(. ) as implemented in Biopython as pairwise2.align.globalxx⁸⁹. The function returns the sum of the alignment of two sequences where the alignment contains 1 for every matched pair of amino acids and zero for every mismatched amino acid pair. No gap or insertion penalties were used.

Clustering

To cluster CDR loops by structure and sequence we implemented a greedy clustering algorithm which we applied to pairwise distance matrices (Alg. 1). For structures the distance matrices contained RMSD between structures, and upsampled RMSD in cases where loop lengths were different. For sequences we calculated a sequence distance matrix from the sequence identity matrix as d_i,j = 1 − i_i,j.

Algorithm 1

Greedy clustering

1: function ClusterD, N, τ

2: Let clusters ← {}

3: Let clustered_indices ← []

4: Let count ← 0

5: for i = 1 to N do

6: if i ∉ clustered_indices then

7: clusters[count] ← [i]

8: append i to clustered_indices

9: for j = i + 1 to N do

10: if j ∉ clustered_indicesandD_i,j < τ then

11: append D_i,j to clusters[count]

12: append j to clustered_indices

13: end if

14: end for

15: count ← count + 1

16: end if

17: end for

18: return clusters

19: end function

To visualise clusters we used the t-SNE dimension reduction algorithm⁹⁰ as implemented in scikit-learn⁹¹, with a perplexity parameter of 10. We used the t-SNE plots for visualisation purposes only and used the clustering algorithm as outlined above to colour the clusters.

Statistics and reproducibility

All TCR structure predictions were evaluated using a publicly available test set containing 45 TCRs deposited in the PDB (SI Table 1). We report the sample mean of the RMSD of the aligned structures against the crystal structures.

The box plots reported in Fig. 2, which visualise the RMSD evaluation, are defined as follows: boxes represent the inner quartiles of the data, with the median value the dividing line; whiskers extend to points that lie within 1.5 times the interquartile range of the lower and upper quartile; observations beyond this range are displayed independently. N = 45, the number of structures in the test set.

Categories: Protein databases, Protein structure predictions, Structural biology, VDJ recombination

Tags: Protein databases, Protein structure predictions, Structural biology, VDJ recombination

TCR catch bonds nonlinearly control CD8 cooperation to shape T cell specificity

Naturally evolved T-cell receptors (TCRs) exhibit remarkably high specificity in discriminating non-self antigens from self-antigens under dynamic biomechanical modulation. In contrast, engineered high-affinity TCRs often lose this specificity, leading to cross-reactivity with self-antigens and off-target toxicity. The underlying mechanism for this difference remains unclear. Our study reveals that natural TCRs exploit mechanical force to form optimal catch bonds with their cognate antigens. This process relies on a mechanically flexible TCR–pMHC binding interface, which enables force-enhanced CD8 coreceptor binding to MHC-α₁α₂ domains through sequential conformational changes induced by force in both the MHC and CD8. Conversely, engineered high-affinity TCRs create rigid, tightly bound interfaces with cognate pMHCs of their parental TCRs. This rigidity prevents the force-induced conformational changes necessary for optimal catch-bond formation. Paradoxically, these high-affinity TCRs can form moderate catch bonds with non-stimulatory pMHCs of their parental TCRs, leading to off-target cross-reactivity and reduced specificity. We have also developed comprehensive force-dependent TCR–pMHC kinetics-function maps capable of distinguishing functional and non-functional TCR–pMHC pairs and identifying toxic, cross-reactive TCRs. These findings elucidate the mechano-chemical basis of the specificity of natural TCRs and highlight the critical role of CD8 in targeting cognate antigens. This work provides valuable insights for engineering TCRs with enhanced specificity and potency against non-self antigens, particularly for applications in cancer immunotherapy and infectious disease treatment, while minimizing the risk of self-antigen cross-reactivity.

AdMaPlace March 9, 2025

0 Comments

Type 2 immunity in allergic diseases

Significant advancements have been made in understanding the cellular and molecular mechanisms of type 2 immunity in allergic diseases such as asthma, allergic rhinitis, chronic rhinosinusitis, eosinophilic esophagitis (EoE), food and drug allergies, and atopic dermatitis (AD). Type 2 immunity has evolved to protect against parasitic diseases and toxins, plays a role in the expulsion of parasites and larvae from inner tissues to the lumen and outside the body, maintains microbe-rich skin and mucosal epithelial barriers and counterbalances the type 1 immune response and its destructive effects. During the development of a type 2 immune response, an innate immune response initiates starting from epithelial cells and innate lymphoid cells (ILCs), including dendritic cells and macrophages, and translates to adaptive T and B-cell immunity, particularly IgE antibody production. Eosinophils, mast cells and basophils have effects on effector functions. Cytokines from ILC2s and CD4+ helper type 2 (Th2) cells, CD8 + T cells, and NK-T cells, along with myeloid cells, including IL-4, IL-5, IL-9, and IL-13, initiate and sustain allergic inflammation via T cell cells, eosinophils, and ILC2s; promote IgE class switching; and open the epithelial barrier. Epithelial cell activation, alarmin release and barrier dysfunction are key in the development of not only allergic diseases but also many other systemic diseases. Recent biologics targeting the pathways and effector functions of IL4/IL13, IL-5, and IgE have shown promising results for almost all ages, although some patients with severe allergic diseases do not respond to these therapies, highlighting the unmet need for a more detailed and personalized approach.

AdMaPlace March 9, 2025

0 Comments

Iron homeostasis and ferroptosis in muscle diseases and disorders: mechanisms and therapeutic prospects

The muscular system plays a critical role in the human body by governing skeletal movement, cardiovascular function, and the activities of digestive organs. Additionally, muscle tissues serve an endocrine function by secreting myogenic cytokines, thereby regulating metabolism throughout the entire body. Maintaining muscle function requires iron homeostasis. Recent studies suggest that disruptions in iron metabolism and ferroptosis, a form of iron-dependent cell death, are essential contributors to the progression of a wide range of muscle diseases and disorders, including sarcopenia, cardiomyopathy, and amyotrophic lateral sclerosis. Thus, a comprehensive overview of the mechanisms regulating iron metabolism and ferroptosis in these conditions is crucial for identifying potential therapeutic targets and developing new strategies for disease treatment and/or prevention. This review aims to summarize recent advances in understanding the molecular mechanisms underlying ferroptosis in the context of muscle injury, as well as associated muscle diseases and disorders. Moreover, we discuss potential targets within the ferroptosis pathway and possible strategies for managing muscle disorders. Finally, we shed new light on current limitations and future prospects for therapeutic interventions targeting ferroptosis.

AdMaPlace March 9, 2025

0 Comments

γδ T-cell autoresponses to ectopic membrane proteins: a new type of pattern recognition

T-cell receptor (TCR) γδ-expressing cells are conserved lymphocytes of innate immunity involved in first-line defense and immune surveillance. TCRγδ recognizes protein/nonprotein ligands without the help of the major histocompatibility complex (MHC), especially via direct binding to protein ligands, which is dependent primarily on the δ chain complementary determining region 3 (CDR3δ). However, the mechanism of protein‒antigen recognition by human γδ TCRs remains poorly defined. We hypothesize that γδ TCRs recognize self-proteins expressed ectopically on the cell membrane that are derived from intracellular components under stress. Here, we mapped 16 intercellular self-proteins among 21,000 proteins with a huProteinChip as putative ligands for Vδ1/Vδ2 TCRs, 13 for Vδ1 TCRs and 3 for Vδ2 TCRs. Functional tests confirmed that ectopic nucleolin (NCL) is a ligand for the Vδ1 TCR, whereas protein-glutamine γ-glutamyltransferase K (TGM1) is a ligand for the Vδ2 TCR. In the context of radiation exposure, the ectopic expression of intracellular proteins on the tumor cell surface is related to the increased antitumor cytotoxicity of γδ T cells both in vitro and in vivo. In conclusion, the recognition of intracellular proteins that are ectopically expressed on somatic cells by human γδ TCRs is a basic interaction mechanism that enables new types of immune pattern recognition and a novel γδ TCR-ligand-based strategy for tumor immunotherapy.

AdMaPlace March 9, 2025

0 Comments

Targeting of TAMs: can we be more clever than cancer cells?

With increasing incidence and geography, cancer is one of the leading causes of death, reduced quality of life and disability worldwide. Principal progress in the development of new anticancer therapies, in improving the efficiency of immunotherapeutic tools, and in the personification of conventional therapies needs to consider cancer-specific and patient-specific programming of innate immunity. Intratumoral TAMs and their precursors, resident macrophages and monocytes, are principal regulators of tumor progression and therapy resistance. Our review summarizes the accumulated evidence for the subpopulations of TAMs and their increasing number of biomarkers, indicating their predictive value for the clinical parameters of carcinogenesis and therapy resistance, with a focus on solid cancers of non-infectious etiology. We present the state-of-the-art knowledge about the tumor-supporting functions of TAMs at all stages of tumor progression and highlight biomarkers, recently identified by single-cell and spatial analytical methods, that discriminate between tumor-promoting and tumor-inhibiting TAMs, where both subtypes express a combination of prototype M1 and M2 genes. Our review focuses on novel mechanisms involved in the crosstalk among epigenetic, signaling, transcriptional and metabolic pathways in TAMs. Particular attention has been given to the recently identified link between cancer cell metabolism and the epigenetic programming of TAMs by histone lactylation, which can be responsible for the unlimited protumoral programming of TAMs. Finally, we explain how TAMs interfere with currently used anticancer therapeutics and summarize the most advanced data from clinical trials, which we divide into four categories: inhibition of TAM survival and differentiation, inhibition of monocyte/TAM recruitment into tumors, functional reprogramming of TAMs, and genetic enhancement of macrophages.

AdMaPlace March 9, 2025

0 Comments

Introduction

Results

Curating and analysing a supplemented dataset of solved TCR structures

Benchmarking state-of-the-art TCR structure prediction models

Enabling structure-based deep learning research with 1.5 million TCR structure predictions

Potential explanations for CDR3α structure diversity

Discussion

Methods

Data curation

Gene annotation and numbering

TCRBuilder2+ model and training

TCR structure predictions

ImmuneBuilder

Alphafold Multimer

Alphafold3

LYRA & TCRmodel

Sequence numbering and gene annotation

Calculating structural deviation

Statistical significance of RMSD distributions

Calculating LDDT scores

Sequence identity

Clustering

Algorithm 1

Statistics and reproducibility

Related Articles

Responses