An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer

作者:Yang Xi; Wu Chengkun*; Lu Kai; Fang Lin; Zhang Yong; Li Shengkang; Guo Guixin; Du YunFei*; Wu CK*; Du YF*
来源:Molecules, 2017, 22(12): 2116.
DOI:10.3390/molecules22122116

摘要

Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, we propose Oriona big data interface on the Tianhe-2 supercomputerto enable big data applications to run on Tianhe-2 via a single command or a shell script. Orion supports multiple users, and each user can launch multiple tasks. It minimizes the effort needed to initiate big data applications on the Tianhe-2 supercomputer via automated configuration. Orion follows the allocate-when-needed paradigm, and it avoids the idle occupation of computational resources. We tested the utility and performance of Orion using a big genomic dataset and achieved a satisfactory performance on Tianhe-2 with very few modifications to existing applications that were implemented in Hadoop/Spark. In summary, Orion provides a practical and economical interface for big data processing on Tianhe-2.

  • 单位
    深圳华大基因研究院; 中国人民解放军国防科学技术大学

全文