In bioinformatics specialized annotation datasets are often very small which impedes the development of successful machine learning models for associated prediction tasks, especially within deep learning. To overcome the challenge of limited data, knowledge transfer strategies can be pursued such as transfer and multi-task learning.
One example of such a prediction task to benefit from knowledge transfer is epitope prediction. An epitope is comprised of the distinct amino acids of a protein involved in the binding of an antibody. The characterization of an antibody’s binding site on their respective antigen is crucial for their efficient use in diagnostics and biomedical research as well as for a deeper understanding of the immune response. While several machine learning models for epitope prediction have previously been developed, their performance is not highly accurate yet and their results not reliable, epitope prediction is thus still a major unsolved problem within the bioinformatics field. It has been shown that epitope prediction can benefit from the inclusion of related annotation data, as the inclusion of general protein-protein-interaction (PPI) sites was effective in improving our Serendip-CE epitope predictor. Additionally, preliminary results demonstrated that multi-task learning for protein-protein interface prediction is very effective. As the binding of an antibody to the protein’s epitope can be considered as a specific form of PPI, we anticipate epitope prediction would benefit strongly from knowledge transfer of this area.
In this project, we will systematically check which transfer learning approaches are most effective on epitope prediction data. We will compare multi-task learning against (i) the classic approach of transfer learning, where pre-trained weights are simply used as the starting state, and (ii) against an approach where a pretrained network is used via regularization, where any deviations of the pretrained network will be penalised.