C3VDv2 : Colonoscopy 3D Video Dataset with Enhanced Realism

Lucas Sebastian Galeano Fretes
Loren Ayers

Johns Hopkins University

Abstract

Computer vision techniques have the potential to improve the diagnostic performance of colonoscopy, but the lack of 3D colonoscopy datasets for training and validation hinders their development. This paper introduces C3VDv2, the second version (v2) of the high-definition Colonoscopy 3D Video Dataset, featuring enhanced realism designed to facilitate the quantitative evaluation of 3D colon reconstruction algorithms. 192 video sequences were captured by imaging 60 unique, high-fidelity silicone colon phantom segments. Ground truth depth, surface normals, optical flow, occlusion, six-degree-of-freedom pose, coverage maps, and 3D models are provided for 169 colonoscopy videos. Eight simulated screening colonoscopy videos acquired by a gastroenterologist are provided with ground truth poses. The dataset includes 15 videos featuring colon deformations for qualitative assessment. C3VDv2 emulates diverse and challenging scenarios for 3D reconstruction algorithms, including fecal debris, mucous pools, blood, debris obscuring the colonoscope lens, en-face views, and fast camera motion. The enhanced realism of C3VDv2 will allow for more robust and representative development and evaluation of 3D reconstruction algorithms.

arXiv Paper Code

What's New?

  • Larger dataset with 8X videos (n=192) and 2X colon geometries compared to C3VD.
  • Realistic artifacts such as fecal debris, mucus pools, blood, foam, and debris and water on the lens. And instruments like water jets, lens cleaning, and suction.
  • Challenging scenarios include fast and less smooth camera motion, en-face to down-the-barrel transitions, close-up en-face views of textureless surfaces, the scope getting covered in debris, and lens cleaning. Trajectories such as straight-line in-and-out motions, loops where the first and last points are the same, and paths where the first half mirrors the second half with lens cleaning in the middle.
  • Colon deformation videos for qualitative assessment. Camera poses and undeformed 3D models are provided without pixel-wise GT.
  • Paired clean & debris-filled colon frames. For every debris-filled colon video, there is a corresponding clean colon video with the same camera trajectory, imaging the same colon phantom.

Examples

Polyp cleaning with water jet followed by scope dipping in mucous pool and lens cleaning.

Fast Loop

Flowing red debris with dirty lens and lens cleaning. First half of camera trajectory mirrors the second half.

Exploratory Motion

En face to down the barrel motion


Synchronized clean and debris colon video pair.


Colon deformation video.


Dataset

C3VDv2 consists of two distinct colon shapes (c1 and c2), each segmented into seven to eight anatomical regions, with each segment further having four unique textures and colors (t1, t2, t3, and t4). C3VDv2 contains 192 videos with a total of 169,371 frames. It comprises three different types of video sequences:

  • Pixel-level Ground Truth Videos: Registered Videos were acquired with a static, undeformed colon phantom and are provided with per-frame ground truth maps (depth, normals, optical flow, etc.). Up to three videos were recorded per phantom segment:
    • v1: clean colon with a baseline camera trajectory and imaging settings.
    • v2: clean colon with a different camera trajectory and imaging settings as v1.
    • v3: debris-filled colon using the same camera trajectory and imaging settings as v2.
    This category includes 169 short videos with a total of 67,886 frames.
  • Deformation Videos: Deformation Videos consist of v4 videos featuring externally induced active phantom deformation, captured with either static or linear camera motion. All videos include debris. Each folder contains all recorded RGB frames and a corresponding pose.txt file (if camera is not stationary). The camera poses are in a frame-wise homogeneous format. This folder contains 15 short videos with a total of 6,185 frames.
  • Simulated Screening Videos: Screening Videos comprise full-colon withdrawal sequences performed by a gastroenterologist to capture realistic camera motion. Similar to deformation videos, only RGB frames and camera poses in pose.txt are provided. A total of 8 videos are included, comprising 95,300 frames.
Parameters such as camera trajectory, speed, edge enhancement settings, simulated artifacts and challenging cases description, are comprehensively documented in the C3VDv2_Data_Summary_Sheet_v1.xlsx.

The dataset is publicly hosted on Johns Hopkins Research Data Repository. You can either directly download from the repository page or via links below (Dataverse API based). We have also provided a bash script to download data via Dataverse API calls.

Registered Videos

For each registered video frame, the dataset includes:

  • RGB frame: rgb/NNNN.png represents the raw (distorted) video frame from the Olympus CF-HQ190L video colonoscope. The black border with video metadata was cropped, resulting in an image size of 1350 x 1080 pixels. NNNN denotes the 4-digit frame number within the video.
  • Depth frame: depth/NNNN_depth.tiff represents the depth along the camera frame's Z-axis, clamped between 0 and 100 mm, and linearly scaled and encoded as a 16-bit grayscale image. For example, a pixel value of 16,384 corresponds to a depth of 25 mm.
  • Surface normal frame: normals/NNNN_normals.tiff stores the X, Y, and Z components of the surface normal vector for each surfel in the R, G, and B color channels, respectively. Components are linearly scaled from ±1 to 0-65535 and encoded as a 16-bit color image. Normal vector directions are defined with respect to the camera coordinate system: +x points right, +y points down, and +z points along the viewing direction (i.e., away from the camera).
  • Optical flow frame: optical_flow/NNNN_flow.tiff depicts the optical flow from the current to the previous frame. X-direction motion (left to right, clamped between -20 to 20 pixels) is stored in the red channel, and Y-direction motion (up to down, clamped between -20 to 20 pixels) is stored in the green channel. Flow values are linearly scaled and encoded as a 16-bit color image.
  • Occlusion frame: occlusions/NNNN_occlusion.tiff indicates pixels that occlude other mesh faces within 100 mm of the camera origin, assigning a value of 255 to these pixels and 0 to all others. This binary data is encoded as an 8-bit grayscale image.
  • Diffuse Frame: diffuse/NNNN_diffuse.png encodes Lambertian reflectance, computed using the dot product of the surface normal and the direction of the incident light. Reflectance values range from 0.1 to 1.0 and are linearly scaled and encoded as an 8-bit grayscale image.
  • Camera pose:  pose.txt contains each frame's flattened homogeneous camera-to-world transformation matrix (row major order).
  • 3D model and coverage map: coverage_mesh.obj stores the ground truth triangulated mesh. Texture vertices store coverage values, where vt=1 indicates an observed face, and vt=2 indicates an unobserved face.
Colon Segment Phantom Number Video Number # Frames Preview Download

Deformation Videos

ColonSegmentPhantom Number Video Number# Frames PreviewDownload

Screening Videos

ColonSegmentPhantom Number Video Number# Frames PreviewDownload

3D Model Files

ColonSegment Lumen DownloadMold Download

Camera Calibration Files

The spherical omnidirectional camera intrinsics are given in camera_intrinsics.txt. Additionally, two calibration sequences are provided for geometric and photometric calibration in the camera_calibration folder:


Original C3VD

Registered Videos (C3VD v1)

ColonSegmentTextureVideo # FramesPreviewOld NameDownload

Screening Videos (C3VD v1)

ColonSegmentTextureVideo # FramesPreviewDownloadOld Name

3D Model Files (C3VD v1)

ColonSegment Lumen DownloadMold Download

Calibration Files

Citation

Please consider citing our publications if you use code or data from this site.

 
 @article{golhar2025c3vdv2,
  title={C3VDv2--Colonoscopy 3D video dataset with enhanced realism},
  author={Golhar, Mayank V and Fretes, Lucas Sebastian Galeano and Ayers, Loren and Akshintala, Venkata S and Bobrow, Taylor L and Durr, Nicholas J},
  journal={arXiv preprint arXiv:2506.24074},
  year={2025}
}
 
  @article{bobrow2023,
  title={Colonoscopy 3D video dataset with paired depth from 2D-3D registration},
  author={Bobrow, Taylor L and Golhar, Mayank and Vijayan, Rohan and Akshintala, Venkata S and Garcia, Juan R and Durr, Nicholas J},
  journal={Medical Image Analysis},
  pages={102956},
  year={2023},
  publisher={Elsevier},
}

This work is licensed under CC BY-NC-SA 4.0