T-cell Receptors is the term used to describe the collective set of diverse receptor proteins found on different T-cells within an individual’s immune system. T-cells exhibit a high degree of diversity in their T-cell receptors, allowing the immune system to recognize a broad array of antigens. This diversity is achieved through genetic recombination processes during T-cell development. Each T-cell possesses a unique T-cell receptor capable of recognizing specific antigens. The overall collection of these receptors across the T-cell population contributes to the immune system’s ability to respond to a wide range of potential threats.


T-Cell Receptor-Peptide Interaction Prediction with Physical Model Augmented Pseudo-Labeling

Predicting the interactions between T-cell receptors (TCRs) and peptides is crucial for the development of personalized medicine and targeted vaccine in immunotherapy. Current datasets for training deep learning models of this purpose remain constrained without diverse TCRs and peptides. To combat the data scarcity issue presented in the current datasets, we propose to extend the training dataset by physical modeling of TCR-peptide pairs. Specifically, we compute the docking energies between auxiliary unknown TCR-peptide pairs as surrogate training labels. Then, we use these extended example-label pairs to train our model in a supervised fashion. Finally, we find that the AUC score for the prediction of the model can be further improved by pseudo-labeling of such unknown TCR-peptide pairs (by a trained teacher model), and re-training the model with those pseudo-labeled TCR-peptide pairs. Our proposed method that trains the deep neural network with physical modeling and data-augmented pseudo-labeling improves over baselines in the available two datasets. We also introduce a new dataset that contains over 80,000 unknown TCR-peptide pairs with docking energy scores.