摘要

We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multi-view images. While the existing state-of-the-art methods have achieved satisfactory results, the accuracy and scalability remain an open problem due to unreliable dense matching and memory-consuming cost volume regularization. To this end, we propose a multi-level fusion aware feature pyramid based multi-view stereo network (MFNet) for reliable depth inference. First, we adopt a coarse-to-fine strategy that achieves high-resolution depth estimation based on the coarse depth map. This strategy gradually narrows the depth search interval by using the prior information from the previous stage, which dramatically reduces memory consumption. Second, we conduct multi-level fusions to construct the feature pyramid such that the different level features receive information from each other, thus enabling rich multi-level feature representations. Finally, the group-wise correlation similarity measure is introduced to replace the variance-based approach used in previous works for cost volume construction, resulting in a lightweight and effective cost volume representation. Experimental results on the DTU, Tanks & Temples, and BlendedMVS benchmark datasets show that MFNet achieves better results than the state-of-the-art methods.