Study: Artificial Intelligence(AI)/AI: 3D Vision

[๋…ผ๋ฌธ ๋ฆฌ๋ทฐ] NeRF in the Wild(NeRF-W): NeRFwithRealWorld + Embedding - Neural Radiance Fields for Unconstrained Photo Collections (CVPR 2021 Oral)

DrawingProcess 2024. 11. 14. 02:12
๋ฐ˜์‘ํ˜•
๐Ÿ’ก ๐Ÿ’ก ๋ณธ ๋ฌธ์„œ๋Š” 'Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections (Arxiv 2024)' ๋…ผ๋ฌธ์„ ์ •๋ฆฌํ•ด๋†“์€ ๊ธ€์ด๋‹ค.
ํ•ด๋‹น ๋…ผ๋ฌธ์€ ๊ด€๊ด‘๊ฐ์ด ์ฐ์€ ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•˜์—ฌ 3D Reconstruction์„ ์ง„ํ–‰ํ•˜๋Š” Task(unstructured tourist environments)๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋…ผ๋ฌธ์ด๋‹ค. ์ด๋Š” NeRF ๊ธฐ๋ฐ˜์ด ์•„๋‹Œ Gaussian Splatting ์„ ํ™œ์šฉํ•˜์˜€์œผ๋ฉฐ, Hierarchical Appearance Modeling๊ณผ Depth Regularization์„ ์ง„ํ–‰ํ•œ ๊ฒƒ์ด ํŠน์ง•์ด๋‹ˆ ์ฐธ๊ณ ํ•˜๊ธฐ ๋ฐ”๋ž€๋‹ค.

 - Project: https://www.lerf.io/
 - Paper: https://arxiv.org/abs/2303.09553
 - Github: https://github.com/kerrj/lerf
 - Dataset: https://drive.google.com/drive/folders/1vh0mSl7v29yaGsxleadcj-LCZOE_WEWB

Abstract

๊ธฐ์กด์˜ NeRF๋Š” staticํ•œ subjects์— ๋Œ€ํ•ด์„œ๋งŒ ๋‹ค๋ฃจ์—ˆ๋‹ค. ๋”ฐ๋ผ์„œ variable illumination or transient occluders์™€ ๊ฐ™์€ ์‹ค์ œ ํ˜„์ƒ์„ ๋‹ค๋ฃฌ ์‚ฌ์ง„์— ๋Œ€ํ•ด์„œ๋Š” ๋‹ค๋ฃจ์ง€ ์•Š์•˜๋‹ค. ๋”ฐ๋ผ์„œ unstructured image collections์œผ๋กœ ๋ถ€ํ„ฐ NeRF๋ฅผ ํ†ตํ•ด 3D Reconstruction์„ ์ ์šฉํ•ด๋ณด๊ฒ ๋‹ค.

์ด ๋…ผ๋ฌธ์ด ๊ฐ€์ง€๋Š” contribution ์ค‘ ๊ฐ€์žฅ ๋ฉ”์ธ์ด ๋˜๋Š” ๋ถ€๋ถ„์„ ์ •๋ฆฌํ•ด๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • Latent Appearance Modeling์„ ์ง„ํ–‰ํ•˜์—ฌ ์ถœ๋ ฅ์˜ Appearance ๋ณ€ํ™”(์กฐ๋„)๋ฅผ ์กฐ์ ˆํ•ด๋ณด์ž
  • Staticํ•œ ๋„คํŠธ์›Œํฌ์™€ Transientํ•œ ๋„คํŠธ์›Œํฌ๋ฅผ ๋ถ„๋ฆฌํ•˜์—ฌ Transientํ•œ Object๋ฅผ ์ œ๊ฑฐํ•˜์ž(+ Uncertainty)

Methods

Architecture

NeRF-W๋Š” NeRF์˜ Network์™€ ๋งŽ์€ ์ฐจ์ด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์ง€ ์•Š๋‹ค. input๊ณผ output์„ ๊ธฐ์ค€์œผ๋กœ ๋„คํŠธ์›Œํฌ๋ฅผ ๋น„๊ตํ•ด๋ณธ ๊ฒฐ๊ณผ, ์ƒ‰๊น”๋กœ ํ•˜์ด๋ผ์ดํŠธํ•ด๋‘” ๋ถ€๋ถ„์ด ๊ฐ™์•˜์œผ๋ฉฐ, ๊ทธ ์™ธ์˜ ๋ถ€๋ถ„๋งŒ ์ถ”๊ฐ€๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. 

๊ทธ ์™ธ์˜ ๋ถ€๋ถ„์ธ Appearance Embedding๊ณผ MLP_3์— ์—ฎ์—ฌ์žˆ๋Š” Transient Embedding, Uncertainty(ฮฒ)์— ๋Œ€ํ•ด์„œ๋Š” ์•„๋ž˜์—์„œ ์ฐจ๋ก€๋Œ€๋กœ ์•Œ์•„๊ฐ€๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค.

Static Network

Static Network๋Š” ๊ธฐ์กด์˜ NeRF ๋ชจ๋ธ์—์„œ Appearance Embedding ๋ถ€๋ถ„๋งŒ ์ถ”๊ฐ€๋กœ ๋„ฃ์–ด์ฃผ์—ˆ๋‹ค.

Appearance Embedding์€ mm.Embedding์œผ๋กœ ์ƒ์„ฑ๋œ Embedding Vector๋กœ, Random ์ดˆ๊ธฐํ™”๋œ ํ›„ MLP๋ฅผ ํ†ตํ•ด ํ•™์Šต๋œ๋‹ค. ์ด๋Š” ์ด๋ฏธ์ง€์˜ Embedding Vector์ด๋ฉฐ, ์ถ”ํ›„ Appearance ์กฐ์ •์„ ์œ„ํ•ด ํ•ด๋‹น Embedding Vector๋ฅผ ์ˆ˜์ •ํ•˜๋ฉฐ Appearance ์กฐ์ • ๊ฐ€๋Šฅํ•˜๋‹ค(์Šคํƒ€์ผ ์กฐ์ •).

์ด๋•Œ Appearance๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด์„œ ํ•™์Šตํ–ˆ๊ธฐ์—, ํ…Œ์ŠคํŠธ์‹œ ํƒ€๊ฒŸ ์ด๋ฏธ์ง€์— ๋งž๊ฒŒ Embedding Vertor ๋‚ด์—์„œ ์œ ์‚ฌํ•œ ๋ฒกํ„ฐ๋ฅผ ์ถ”์ถœํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค.

Transient Network

Transient Network๋Š” NeRF ๋ชจ๋ธ์˜ 3D shape ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” MLP1 ๋ถ€๋ถ„์€ ๊ทธ๋ž˜๋„ ์‚ฌ์šฉํ•˜๋˜, MLP3๋ฅผ ์ถ”๊ฐ€๋กœ ๋‘์–ด Transient Object(Occuluder)๋ฅผ ์ œ๊ฑฐํ•˜๋„๋ก ํ•™์Šตํ•œ๋‹ค.

์ด๋•Œ Network์˜ Output์œผ๋กœ Uncertainty๋„ ๋‚˜์˜ค๊ฒŒ ๋˜๋Š”๋ฐ, ์ด๋Š” Loss Term์œผ๋กœ๋งŒ ํ•™์Šต๋œ๋‹ค. ์ด ์—ญ์‹œ๋„ color ๋กœ pixel ๊ฐ’์„ ๋ Œ๋”๋ง ํ•˜๋“ฏ, uncertainty๋ฅผ ๋ Œ๋”๋งํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์˜ ์ˆ˜์‹์ฒ˜๋Ÿผ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์œผ๋ฉฐ ์ถ”์ถœ๋œ ๊ฒฐ๊ณผ๋Š” (e) Uncertainty ์™€ ๊ฐ™๋‹ค.

Volume Rendering

Optimization

Implementation Details

  • COLMAP์„ ์ด์šฉํ•˜์—ฌ Camera Pose ์ถ”์ •
  • ์ด 300,000ํšŒ ๋ฐ˜๋ณต, batch size 2048, 8๊ฐœ Nvidia V100 GPU ์ด์šฉ

Experimental

 

๋ฐ˜์‘ํ˜•