The idea is to interpret protein sequences as sentences and their constituent – amino acids – as single words [7]. More specifically we will fine tune Pytorch ProtBert model from Hugging Face library.
The second part of the deploy-and-similarity-search.ipynb notebook uses a sample protein, Immunoglobulin Heavy Diversity 2/OR15-2A, calculate its embeddings and find the 5 most similar proteins ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results