Methods for Secondary and Tertiary Structure Prediction of Microproteins
Abstract
Microproteins are a newly recognized and rapidly growing class of small proteins, typically encoded by fewer than 100 to 150 codons and translated from small open reading frames (smORFs). Although research has shown that smORFs and their corresponding microproteins constitute a significant portion of the genome and proteome, there is still limited information available in the literature regarding the structural characteristics of microproteins. In this paper, we discuss the methods available for predicting their secondary and tertiary structures and provide examples of calculations done with three archetypical methods (AlphaFold, I TASSER and ROSETTA). We present results predicting the structures of 44 microproteins. For this set of microproteins the methods considered here show a reasonable agreement among them and with the very few cases in which experimental structures are available. None the less, the agreement with experimental structures is not as good as for larger proteins, indicating that it is necessary to obtain a much larger set of experimental microproteins structures to better evaluate and eventually calibrate prediction methods.