摘要

Circular RNA (circRNA) can exert biological functions by interacting with RNA-binding protein (RBP), and some deep learning-based methods have been developed to predict RBP binding sites on circRNA. However, most of these methods identify circRNA-RBP binding sites are only based on single data resource and cannot provide exact binding sites, only providing the probability value of a sequence fragment. To solve these problems, we propose a binding sites localization algorithm that fuses binding sites from multiple databases, and further design a stacked generalization ensemble deep learning model named CirRBP to identify RBP binding sites on circRNA. The CirRBP is trained by combining the binding sites from multiple databases and makes predictions by weighted aggregating the predictions of each sub-model. The results show that the CirRBP outperforms any sub-model and existing online prediction model. For better access to our research results, we develop an open-source web application called CRWS (CircRNA-RBP Web Server). Its back-end learning model of the CRWS is a stacked generalization ensemble learning model CirRBP based on different deep learning frameworks. Given a full-length circRNA or fragment sequence and a target RBP, the CRWS can analyze and provide the exact potential binding sites of the target RBP on the given sequence through the binding sites localization algorithm, and visualize it. In addition, the CRWS can discover the most widely distributed motif in each RBP dataset. Up to now, CRWS is the first significant online tool that uses multi-source data to train models and predict exact binding sites. CRWS is now publicly and freely available without login requirement at: http://www.bioinformatics.team.

  • 单位
    桂林理工大学