RAfilter: an algorithm for detecting and filtering false-positive alignments in repetitive genomic regions

作者:Yang, Jinbao; Zhao, Xianjia; Jiang, Heling; Yang, Yingxue; Hou, Yuze; Pan, Weihua*
来源:Horticulture Research, 2023, 10(1): uhac288.
DOI:10.1093/hr/uhac288

摘要

Telomere to telomere (T2T) assembly relies on the correctness of sequence alignments. However, the existing aligners tend to generate a high proportion of false-positive alignments in repetitive genomic regions which impedes the generation of T2T-level reference genomes for more important species. In this paper, we present an automatic algorithm called RAfilter for removing the false-positives in the outputs of existing aligners. RAfilter takes advantage of rare k-mers representing the copy-specific features to differentiate false-positive alignments from the correct ones. Considering the huge numbers of rare k-mers in large eukaryotic genomes, a series of high-performance computing techniques such as multi-threading and bit operation are used to improve the time and space efficiencies. The experimental results on tandem repeats and interspersed repeats show that RAfilter was able to filter 60%-90% false-positive HiFi alignments with almost no correct ones removed, while the sensitivities and precisions on ONT datasets were about 80% and 50% respectively.

全文