ScholarMate
客服热线:400-1616-289

DeepHost: phage host prediction with convolutional neural network

Wang Ruohan*; Zhang Xianglilan; Wang Jianping; Li Shuai Cheng*
Science Citation Index Expanded
-

摘要

Next-generation sequencing expands the known phage genomes rapidly. Unlike culture-based methods, the hosts of phages discovered from next-generation sequencing data remain uncharacterized. The high diversity of the phage genomes makes the host assignment task challenging. To solve the issue, we proposed a phage host prediction tool-DeepHost. To encode the phage genomes into matrices, we design a genome encoding method that applied various spaced k-mer pairs to tolerate sequence variations, including insertion, deletions, and mutations. DeepHost applies a convolutional neural network to predict host taxonomies. DeepHost achieves the prediction accuracy of 96.05% at the genus level (72 taxonomies) and 90.78% at the species level (118 taxonomies), which outperforms the existing phage host prediction tools by 10.16-30.48% and achieves comparable results to BLAST. For the genomes without hits in BLAST, DeepHost obtains the accuracy of 38.00% at the genus level and 26.47% at the species level, making it suitable for genomes of less homologous sequences with the existing datasets. DeepHost is alignment-free, and it is faster than BLAST, especially for large datasets. DeepHost is available at https://github.com/deepomicslab/DeepHost.

关键词

phage-host relationship convolutional neural network genome encoding