Colonoscopy 3D Video Dataset (C3VD)

from Colonoscopy 3D Video Dataset with Paired Depth from 2D-3D Registration

Johns Hopkins University

Abstract

Screening colonoscopy is an important clinical application for several 3D computer vision techniques, including depth estimation, surface reconstruction, and missing region detection. However, the development, evaluation, and comparison of these techniques in real colonoscopy videos remain largely qualitative due to the difficulty of acquiring ground truth data. In this work, we present a Colonoscopy 3D Video Dataset (C3VD) acquired with a high definition clinical colonoscope and high-fidelity colon models for benchmarking computer vision methods in colonoscopy. We introduce a novel multimodal 2D-3D registration technique to register optical video sequences with ground truth rendered views of a known 3D model. The different modalities are registered by transforming optical images to depth maps with a Generative Adversarial Network and aligning edge features with an evolutionary optimizer. This registration method achieves an average translation error of 0.321 millimeters and an average rotation error of 0.159 degrees in simulation experiments where error-free ground truth is available. The method also leverages video information, improving registration accuracy by 55.6% for translation and 60.4% for rotation compared to single frame registration. 22 short video sequences were registered to generate 10,015 total frames with paired ground truth depth, surface normals, optical flow, occlusion, six degree-of-freedom pose, coverage maps, and 3D models. The dataset also includes screening videos acquired by a gastroenterologist with paired ground truth pose and 3D surface models. The dataset and registration source code are available at durr.jhu.edu/C3VD.

Paper

Citation

Please cite our publication if you use code or data from this site.

 @article{bobrow2023,
  title={Colonoscopy 3D video dataset with paired depth from 2D-3D registration},
  author={Bobrow, Taylor L and Golhar, Mayank and Vijayan, Rohan and Akshintala, Venkata S and Garcia, Juan R and Durr, Nicholas J},
  journal={Medical Image Analysis},
  pages={102956},
  year={2023},
  publisher={Elsevier},
}

Results

Colonoscopy video frames (left) are registered with rendered views of a ground truth 3D model (right). Edge features (overlay) are aligned by optimizing a loss function (bottom).

Real colonoscope frames are paired with registered ground truth depth, surface normals, occlusion, and optical flow frames

Dataset

C3VD contains 22 registered videos with paired ground truth depth, surface normals, optical flow, occlusion, six degree-of-freedom pose, coverage maps, and 3D models. The dataset also includes 4 screening colonoscopy videos acquired by a gastroenterologist with paired ground truth pose and 3D surface models. 3D model files and molds are also available for download. Registration and rendering code is made available on GitHub.

Registered Videos

For each registered video frame, the dataset includes:

Depth frame: depth along the camera frame’s z-axis, clamped from 0-100 millimeters. Values are linearly scaled and encoded as a 16-bit grayscale image.
Surface normal frame: reported with respect to the camera coordinate system. X/Y/Z components are stored in separate R/G/B color channels. Components are linearly scaled from ± 1 to 0-65535. Values are encoded as a 16-bit color image.
Optical flow frame: computed flowing from the current frame to the previous frame, meaning the first frame in the sequence has no value. Values are saved in a color image, where the R-channel contains X-direction motion (left→right, -20 to 20 pixels), and the G-channel contains Y-direction motion (up→down, -20 to 20 pixels). Values are linearly scaled from 0 to 65535 and encoded as a 16-bit color image.

Occlusion frame: encoded as an 8-bit binary image. Pixels occluding other mesh faces within 100mm of the camera origin are assigned a value of 255, and all other pixels are assigned a value of 0.
Camera pose: saved in a file named pose.txt. Each line contains a homogenous camera-to-world transformation matrix (flattened in row-major order) corresponding to each frame.

For each video sequence, we also provide:

3D model and coverage map: ground truth triangulated mesh, stored as a Wavefront OBJ file named coverage_mesh.obj. Coverage is embedded in the OBJ file by texture vertices assigned to each face (vt=1 is observed, vt=2 is unobserved).

Model	Texture	Video	# Frames		Download
Cecum	1	a	276	Preview	cecum_t1_a.zip (2.86 GB)
Cecum	1	b	765	Preview	cecum_t1_b.zip (8.36 GB)
Cecum	2	a	370	Preview	cecum_t2_a.zip (3.71 GB)
Cecum	2	b	1,142	Preview	cecum_t2_b.zip (11.06 GB)
Cecum	2	c	595	Preview	cecum_t2_c.zip (6.13 GB)
Cecum	3	a	730	Preview	cecum_t3_a.zip (6.80 GB)
Cecum	4	a	465	Preview	cecum_t4_a.zip (5.04 GB)
Cecum	4	b	425	Preview	cecum_t4_b.zip (4.41 GB)
Descending Colon	4	a	148	Preview	desc_t4_a.zip (1.24 GB)
Sigmoid Colon	1	a	700	Preview	sigmoid_t1_a.zip (5.20 GB)
Sigmoid Colon	2	a	514	Preview	sigmoid_t2_a.zip (4.22 GB)
Sigmoid Colon	3	a	613	Preview	sigmoid_t3_a.zip (4.58 GB)
Sigmoid Colon	3	b	536	Preview	sigmoid_t3_b.zip (4.21 GB)
Transcending Colon	1	a	61	Preview	trans_t1_a.zip (0.59 GB)
Transcending Colon	1	b	700	Preview	trans_t1_b.zip (5.07 GB)
Transcending Colon	2	a	194	Preview	trans_t2_a.zip (1.58 GB)
Transcending Colon	2	b	103	Preview	trans_t2_b.zip (0.97 GB)
Transcending Colon	2	c	235	Preview	trans_t2_c.zip (1.83 GB)
Transcending Colon	3	a	250	Preview	trans_t3_a.zip (1.83 GB)
Transcending Colon	3	b	214	Preview	trans_t3_b.zip (1.66 GB)
Transcending Colon	4	a	382	Preview	trans_t4_a.zip (3.10 GB)
Transcending Colon	4	b	597	Preview	trans_t4_b.zip (4.61 GB)

Screening Videos

In addition to the video sequence, each file also contains camera pose information saved in a file named pose.txt. Each line contains a homogenous pose (flattened in row-major order) corresponding to each frame.

Model	Texture	# Frames		Download
Full Colon	1	5,458	Preview	screening_t1.zip (8.13 GB)
Full Colon	2	5,100	Preview	screening_t2.zip (7.09 GB)
Full Colon	3	4,726	Preview	screening_t3.zip (7.07 GB)
Full Colon	4	4,774	Preview	screening_t4.zip (7.36 GB)

3D Model Files

Model	Object Download	Mold Download
Ascending Colon	ascend_model.obj (25.4 MB)	ascend_mold.zip (18.7 MB)
Cecum	cecum_model.obj (54.8 MB)	cecum_mold.zip (24.9 MB)
Descending Colon	desc_model.obj (38.0 MB)	desc_mold.zip (26.6 MB)
Sigmoid Colon	sigmoid_model.obj (20.8 MB)	sigmoid_mold.zip (42.2 MB)
Transcending Colon	trans_model.obj (18.3 MB)	trans_mold.zip (24.1 MB)
Full Colon	full_model.obj (194.8 MB)

Calibration Files

cfhq190l_10x10mm_checkerboard_images.zip

cfhq190l_omnidirectional_params.zip

Revision History

10/14/2023 | Updated the dataset file names to reflect peer-review completion.

05/03/2023 | Revised ground truth surface normal frames and updated naming convention:

Corrected an error in the rendering code clipped negative surface normal z-components to 0 and resulted in some surface normals having a non-unitary length.
Surface normal axes were updated from +x pointing right, +y pointing up, and +z pointing out of the screen to +x pointing right, +y pointing down, and +z pointing into the screen to be consistent with the camera coordinate system as shown in Figure 3 of the paper.
The naming convention of the frames was updated to include zero padding (e.g. 0005_color.png).

Registration and Rendering Code Available on GitHub

Check out Version 2 of C3VD Here

This work is licensed under CC BY-NC-SA 4.0