Prediction of MHC binding preferences is critical in cancer immunotherapy and vaccine development. Within the cancer field, computational tools are increasingly being used to address this fact as part of NGS workflows, where tumour DNA is sequenced to identify potentially immunogenic neo-antigens.
Current Prediction Tools and Their Drawbacks
Computational tools to predict MHC epitopes are highly attractive since they can direct experimental efforts to a smaller number of epitopes with good predicted binding, and a few such tools have emerged from academic research institutions and biotech companies in recent years. However, the currently availably computational models, e.g., the widely used algorithm netMHCpan, generate a large number of false positives.
This is because the performance of any prediction algorithm depends largely upon the data used for training, which in the case of the current tools comes mainly from a few alleles that are highly expressed by Caucasians. As such, current algorithms have limited use beyond Caucasian populations. A broadly useful prediction tool should be trained on high-quality epitope datasets that encompass a broad range of HLA alleles.
Immunitrack’s NeoScreen® Platform Used to Compare Current and Novel Epitope Prediction Tools
Researchers from the Dana Farber Cancer Institute recently set out to develop a novel MHC/epitope computational model and compare its performance to netMHCpan. Immunitrack was delighted to take part in the work, which was recently published in Nature Biotechnology (Sarkizova et al., 2020). In this project, involving collaborators from a number of leading US institutes, Immunitrack used its NeoScreen® platform to perform in vitro MHC/epitope affinity assessments on a subset of selected predicted epitopes and for a number of MHC alleles. This data was then used to compare the data output from netMHCpan and the newly developed software HLAthena, described below.
Development of HLAthena Software
To enable prediction of endogenous HLA class I-associated peptides across a large fraction of the human population, the researchers used liquid chromatography-tandem mass spectrometry (LC-MS/MS to profile peptides eluted from cell lines that were engineered to express single MHC molecules. In the study, 184,464 peptides were eluted from 95 MHC-A, -B and -G alleles.
From the mass spectrometry data, the researchers could identify canonical peptide motifs per HLA allele, unique and shared binding submotifs across alleles as well as distinct motifs associated with different peptide lengths. By integrating these data with transcript abundance and peptide processing, they developed a novel software tool called HLAthena, which provides allele-and-length-specific and pan-allele-pan-length prediction models for endogenous peptide presentation.
A number of epitopes eluted from the engineered cell lines were predicted by netMHC pan4.0-BA to be weak MHC binders, but shown to have strong affinity using NeoScreen®. In vitro testing of subsets of predicted peptides on the NeoScreen® revealed that the new model could predict endogenous HLA class I-associated ligands with 1.5-fold improvement in positive predictive value compared with existing tools and it correctly identified >75% of HLA-bound peptides that were observed experimentally in 11 patient-derived tumour cell lines.
Read the full article:
Sarkizova, S. et al., A large peptidome dataset improves HLA class I epitope prediction across most of the human population. Nat Biotechnol 38, 199-209 (2020).