Methods for Analyzing RNA Pseudoknots via Chord Diagrams and Intersection Graphs
Abstract
RNA molecules are known to form complex secondary structures including pseudoknots. A systematic framework for the enumeration, classification and prediction of secondary structures is critical to determine the biological significance of the molecular configurations of RNA. Chord diagrams are mathematical objects widely used to represent RNA secondary structures and to analyze structural motifs, however a mathematically rigorous enumeration of pseudoknots remains a challenge. We introduce a method that incorporates a distance-based metric to analyze the intersection graph of a chord diagram associated with a pseudoknotted structure. In particular, our method formally defines a pseudoknot in terms of a weighted vertex cover of a certain intersection graph constructed from a partition of the chord diagram representing the nucleotide sequence of the RNA molecule. In this graph-theoretic context, we introduce a rigorous algorithm that enumerates pseudoknots, classifies secondary structures, and is sensitive to three-dimensional topological features. We implement our methods in MATLAB and test the algorithm on pseudoknotted structures from the bpRNA-1m database. Our findings confirm that genus is a robust quantifier of pseudoknot complexity.