论文标题
立体声视频的组织表面的实时密集重建
Real-time Dense Reconstruction of Tissue Surface from Stereo Optical Video
论文作者
论文摘要
我们提出了一种方法,以实时从立体声视频中重建组织表面的密集的三维(3D)模型,其基本思想是通过使用立体声匹配来首先从视频帧中提取3D信息,然后将重新构造的3D模型镶嵌。为了处理组织表面上常见的低纹理区域,我们提出了局部立体声匹配方法的有效后处理步骤,以扩大约束的半径,其中包括拆除离群值,孔填充和平滑。由于通过立体声匹配获得的组织模型仅限于成像方式的视野,因此我们通过使用基于新型特征的同时定位和映射(SLAM)方法来对齐模型来对模型进行对齐。低纹理区域和不同的照明条件可能会导致大部分的功能匹配异常值。要解决这个问题,我们提出了几种算法来提高SLAM的鲁棒性,主要包括(1)基于直方图投票的方法,以大致从功能匹配结果中大致选择可能的嵌入者,(2)一种新型的1点RANSAC P $ N $ p algorithm,称为AS DynamiCr1pp $ n $ p the Close and Ince the Bunive the a gpest(3)ICP(3)iCPEST(3) (BA)完善摄像机运动估计结果的方法。对外体内和体内数据的实验结果表明,重建的3D模型具有高分辨率纹理,精度误差小于2 mm。大多数算法用于GPU计算高度平行,并且处理一个关键帧的平均运行时间为以960x540分辨率的立体声图像上的76.3 ms。
We propose an approach to reconstruct dense three-dimensional (3D) model of tissue surface from stereo optical videos in real-time, the basic idea of which is to first extract 3D information from video frames by using stereo matching, and then to mosaic the reconstructed 3D models. To handle the common low texture regions on tissue surfaces, we propose effective post-processing steps for the local stereo matching method to enlarge the radius of constraint, which include outliers removal, hole filling and smoothing. Since the tissue models obtained by stereo matching are limited to the field of view of the imaging modality, we propose a model mosaicking method by using a novel feature-based simultaneously localization and mapping (SLAM) method to align the models. Low texture regions and the varying illumination condition may lead to a large percentage of feature matching outliers. To solve this problem, we propose several algorithms to improve the robustness of SLAM, which mainly include (1) a histogram voting-based method to roughly select possible inliers from the feature matching results, (2) a novel 1-point RANSAC-based P$n$P algorithm called as DynamicR1PP$n$P to track the camera motion and (3) a GPU-based iterative closest points (ICP) and bundle adjustment (BA) method to refine the camera motion estimation results. Experimental results on ex- and in vivo data showed that the reconstructed 3D models have high resolution texture with an accuracy error of less than 2 mm. Most algorithms are highly parallelized for GPU computation, and the average runtime for processing one key frame is 76.3 ms on stereo images with 960x540 resolution.