[Vision] Depth Estimation, Depth Completion and Depth refinement

💡 본 문서는 '[Vision] Depth Estimation, Depth Completion and Depth refinement'에 대해 정리해놓은 글입니다.
3D space를 다루다보면 정확한 깊이 정보를 사용해야할 일이 많습니다. 이때 다양한 Depth 관련 방법론과 그에 따른 알고리즘이 있는데, 이를 분류하여 자세히 정리하였으니 참고하시기 바랍니다.

0. Depth Task 정리

Depth estimation: depth가 없을때 rgb 이미지로 추정.
Depth completion: 정확한 sparse depth를 rgb 이미지를 가이드로 하여 dense하게 생성
Depth refinement/enhancement: dense한 depth를 더욱 더 향상

1. Depth Estimation

Depth Estimation이란 말 그대로 영상에서 깊이를 추정하는 것입니다.

출처 :  https://github.com/OniroAI/MonoDepth-PyTorch

위의 그림을 예로 설명을 해보도록 하겠습니다. 위의 그림에서 왼쪽에 차가 가까이 있고, 가운데와 오른쪽 사이의 노란색 표지판은 왼쪽의 차보다는 멀리 있습니다. 이와 같이 사람은 사진을 보고, 어떤 물체가 가까이 있고, 어떤 물체가 멀리 있는지를 대략적으로 알 수 있습니다. 하지만, 컴퓨터는 사진만 보고 깊이를 추정하기 어려워 합니다. 따라서 깊이를 알 수 있도록 Train을 하는 것입니다. 위의 그림에서 원본 이미지 아래에 있는 그림을 보시면, 가까이 있는 부분은 밝은색, 멀리 있는 부분은 어두운 보라색으로 추정된 것을 보실 수 있습니다.

Depth Estimation에는 Stereo와 Mono가 있으며, 일반적으로 Depth를 추정하는데는 Stereo Camera로 찍은 Left Image와 Right Image가 필요합니다. Depth는 Stereo Camera로 찍은 Left Image와 Right Image에서 동일 점을 찍었을 때 얼마만큼의 차이가 나는지의 Disparity를 통해서 구할 수 있습니다.

그림에서 보면 분명 같은 이미지 인 것 같은데 위치가 조금씩 다르죠. 이것이 바로 Stereo Camera로 촬영해 얻은 Left Right image입니다. 그렇다면 궁극적으로 Depth를 추정하기 위해서는 Disparity를 알아야 하고, Disparity를 알려면 Left Image와 Right Image가 모두 필요하며 이 두 Image의 차이인 disparity(시차)를 구해야 하는 것입니다.

추가로, Mono Depth Estimation이라는 것은 말 그대로 하나의 Image만을 가지고 Depth를 추정하는 작업입니다.

1-1) Mono Depth Estimation

Dataset & Benchmark

NYU-Depth V2 (indoor)

[PaperWithCode] Monocular Depth Estimation on NYU-Depth V2: https://paperswithcode.com/sota/monocular-depth-estimation-on-nyu-depth-v2

KITTI Eigen split (Outdoor)

[KITTI] Depth Prediction Evaluation: https://www.cvlibs.net/datasets/kitti/eval_depth.php?benchmark=depth_prediction

Metric Depth Estimation

For the NYU Depth V2, the KITTI Eigen split datasets, and the SUN RGBD dataset,

accuracy under the threshold (δi < 1.25i, i = 1, 2, 3)
mean absolute relative error (AbsRel)
mean squared relative error (SqRel)
root mean squared error (RMSE)
root mean squared log error (RMSElog)
mean log10 error (log10)

KITTI Eigen split (Outdoor)

SILog: Scale invariant logarithmic error [log(m)*100] (for more info click on the formula below)
sqErrorRel: Relative squared error(SqRel) (percent)
absErrorRel: Relative absolute error(AbsRel) (percent)
iRMSE: Root mean squared error of the inverse depth [1/km]

Model

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

a highly practical solution for robust monocular depth estimation by training on a combination of 1.5M labeled images and 62M+ unlabeled images.

[Paper] Depth Anything: https://arxiv.org/pdf/2401.10891v1.pdf
[Git] LiheYoung/Depth-Anything: https://github.com/LiheYoung/Depth-Anything
[Git] fabio-sim/Depth-Anything-ONNX: https://github.com/fabio-sim/Depth-Anything-ONNX
[Git] spacewalk01/depth-anything-tensorrt C++: https://github.com/spacewalk01/depth-anything-tensorrt
[PaperWithCode] Depth Anything: https://paperswithcode.com/paper/depth-anything-unleashing-the-power-of-large#code

NDDepth: Normal-Distance Assisted Monocular Depth Estimation and Completion

[Git] NDDepth: Normal-Distance Assisted Monocular Depth Estimation and Completion: https://github.com/ShuweiShao/NDDepth?tab=readme-ov-file

IEBins: Iterative Elastic Bins for Monocular Depth Estimation

IEBins: iterative elastic bins for the classification-regression-based MDE(Monocular Depth Estimation)

여기서 bin은 pointcloud와 같은 3D space를 small region으로 나누는데 주로 사용하며, IEBins은 이를 adajptive 하게 그룹화 시킨 방법론입니다.

[Git] ShuweiShao/IEBins: https://github.com/ShuweiShao/IEBins

1-2) Stereo Depth Estimation

stereo 이미지 단에서 이미 geometry로 대부분의 픽셀 값의 depth를 얻을 수 있어 굳이 딥러닝을 쓰나? 의아해 할 수 있는데, 사실 Epipolar Geometry에서 얻은 Depth는 좀 많이 안 좋습니다. Plane Sweep Stereo 알고리즘으로 Dense 한 Depth Image을 얻을 수 있긴하지만, 구멍이 중간중간 뚫려 있어서 사용에 용이하지 않습니다.

camera parameter(calibration) : 2D → 3D로 바꿔주는 방식
Cost volume : 두 영상의 강도(Intensity)의 차이를 픽셀 단위로 계산하여 유사도를 측정하는 것
disparty cost volumne (더 정확한 3D 포인트 클라우드 추정할 수 있다.) - 스테레오 비전원리를 활용해 2D이미지를 3D point cloud를 생성하기 위한 요소.
depth cost volume : 양쪽 이미지 매칭을 통해 이미지간 차이로 depth 추정을 하는 알고리즘 → pseudo-LiDAR point cloud를 구성하게 된다.
commodity = stereo camera
스테레오 매칭 알고리즘
- 매칭비용 계산(matching cost computation)
- 비용 정합(cost aggregation) - cost volume의 정보를 정합해 계산 → 신뢰도 증가
- 시차 계산, 최적화(disparty computation / optimization) - 거리변환을 가능하게 한다.
- 시차 정제 (disparty)
stereo disparity estimation : 수평 오프셋을 갖는 한 쌍의 카메라로부터 캡처된 좌우 한 쌍의 영상에서 해당 픽셀들의 수평 위치 차이를 추정하는 과정

물론 요세는 Stereo 카메라는 focal length(거리) 가 짧아서 잘 사용하지 않아 해당 알고리즘보다는 mono depth estimation을 집중적으로 공부하는 것을 추천합니다.

2. Depth Completion

목표: 라이다로 얻은 sparce depth 정보와 RGB 이미지와 coresponding하여 완전한 depth 이미지를 만드는 기법입니다.(completing and reconstructing)

라이다를 이미지에 투영해 얻은 Sparse Dense Map을 RGB 영상을 딥러닝 모델에 올려 Dense Depth Map을 얻어 depth completion이라 부릅니다. 흔히 이를 Depth Estimation의 Ground Truth(GT)로 사용하곤 합니다.

물론 이를 활용해서 SLAM에서 얻은 맵의 내부를 체우기 위해서도 사용합니다.

Deep Depth Estimation from Visual-Inertial SLAM

Dataset & Benchmark

NYU-Depth V2 (indoor)

[PaperWithCode] Monocular Depth Completion on NYU-Depth V2: https://paperswithcode.com/sota/depth-completion-on-nyu-depth-v2

KITTI Eigen split (Outdoor)

[PaperWithCode] Depth Completion: https://paperswithcode.com/task/depth-completion/codeless
[KITTI] Depth Completion Evaluation: https://www.cvlibs.net/datasets/kitti/eval_depth.php?benchmark=depth_completion

참고

[Youtube] Image-guided Depth Completion: A Non-linear Filters, Convolutions, and Transformer [Kim Kyeongseon]: https://www.youtube.com/watch?v=-tR5rYfin48
[Blog] Depth Eestimation, Completion 공부: https://velog.io/@openjr/Depth-Eestimation-Completion-%EB%8F%99%ED%96%A5-%EA%B3%B5%EB%B6%80

저작자표시 비영리 변경금지 (새창열림)

'Study: Artificial Intelligence(AI) > AI: 2D Vision(Det, Seg, Trac)' 카테고리의 다른 글

[Vision] CNN Network 기본 모델 정리 (LeNet-5 부터 ResNet 까지) (1)	2024.02.06
[Vision] Object Tracking: VOT(Visual Object Tracking), MOT(Multiple Object Tracking) (0)	2024.02.05
[논문리뷰] Visual Transformer(ViT): AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE (0)	2024.01.31
[Vision] Image Segmentation 모델 정리 (FCN, U-Net, SegNet, Mask R-CNN, YOLACT, FastFCN, PointRend, YOLOv5, YOLACT++, SparseInst, YOLOv8) (0)	2024.01.27
[Vision] SAM 모델 활용: Model Inference, Fine Tuning(lightning-sam), Automatic Labeling, ... (feat. Segment Anything) (0)	2024.01.24

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

0. Depth Task 정리

1. Depth Estimation

1-1) Mono Depth Estimation

Dataset & Benchmark

NYU-Depth V2 (indoor)

KITTI Eigen split (Outdoor)

Metric Depth Estimation

Model

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

NDDepth: Normal-Distance Assisted Monocular Depth Estimation and Completion

IEBins: Iterative Elastic Bins for Monocular Depth Estimation

1-2) Stereo Depth Estimation

2. Depth Completion

Dataset & Benchmark

NYU-Depth V2 (indoor)

KITTI Eigen split (Outdoor)

참고

'Study: Artificial Intelligence(AI) > AI: 2D Vision(Det, Seg, Trac)' 카테고리의 다른 글

티스토리툴바