For high quality 3D reconstructions of Jinji Lake environment scene, this paper designed an end-to-end deep learning-based dense point cloud reconstruction pipeline using a large number of multi-view images. The method first uses incremental structure from motion (SfM) to compute camera parameters from multi-view images. Then the camera parameters output from SfM are encoded into a learning-based multi-view stereo (MVS) dense reconstruction network to obtain depth maps. The network uses 2D convolution to extract image features, while using the cost volume from the image features to quantify the depth value. Finally, a 3-dimensional convolution kernel is used to regularize the cost volume. The learning-based MVS approach is a deep learning form implementation of the traditional MVS algorithm. In the end, the results of a dense reconstruction of part of Jinji Lake environment scene demonstrate that the learning-based approach outperforms traditional methods in terms of reconstruction quality and is much faster than it in process speed.