Study: Artificial Intelligence(AI)/AI: 2D Vision(Det, Seg, Trac)

[Gen AI] ์ƒ์„ฑํ˜• ๋ชจ๋ธ๋“ค์˜ ์›๋ฆฌ ๋น„๊ต: VAE, GAN, Flow-based, Diffusion

DrawingProcess 2024. 9. 14. 08:16
๋ฐ˜์‘ํ˜•
๐Ÿ’ก ๋ณธ ๋ฌธ์„œ๋Š” '[Gen AI] ์ƒ์„ฑํ˜• ๋ชจ๋ธ๋“ค์˜ ์›๋ฆฌ ๋น„๊ต: VAE, GAN, Flow-based, Diffusion'์— ๋Œ€ํ•ด ์ •๋ฆฌํ•ด๋†“์€ ๊ธ€์ž…๋‹ˆ๋‹ค.
์ƒ์„ฑํ˜• ๋ชจ๋ธ๋“ค ์ค‘ ๋Œ€ํ‘œ์ ์ธ ๋ชจ๋ธ์ธ VAE, GAN, Flow-based, Diffusion์— ๋Œ€ํ•ด ๋น„๊ตํ•˜๊ณ , ๊ฐ ๋ฐฉ๋ฒ•๋ก ์ด Latent variable๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑํ•˜๋Š” ์›๋ฆฌ๋ฅผ ์ •๋ฆฌํ•˜์˜€์œผ๋‹ˆ ์ฐธ๊ณ ํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

1. Prerequisite

1) Markov Chain

Markov ์„ฑ์งˆ์„ ๊ฐ–๋Š” ์ด์‚ฐ ํ™•๋ฅ  ๊ณผ์ •

  • Markov ์„ฑ์งˆ: "ํŠน์ • ์ƒํƒœ์˜ ํ™•๋ฅ (t+1)์€ ์˜ค์ง ํ˜„์žฌ(t)์˜ ์ƒํƒœ์— ์˜์กดํ•œ๋‹ค"
  • ์ด์‚ฐ ํ™•๋ฅ  ๊ณผ์ •: ์ด์‚ฐ์ ์ธ ์‹œ๊ฐ„(0์ดˆ, 1์ดˆ, ..,) ์†์—์„œ์˜ ํ™•๋ฅ ์  ํ˜„์ƒ

$$ P[s_(t+1) | s_(t)] = P[s_(t+1) | s_1, ..., s_(t)] $$

e.g. "๋‚ด์ผ์˜ ๋‚ ์”จ๋Š” ์˜ค๋Š˜์˜ ๋‚ ์”จ๋งŒ ๋ณด๊ณ  ์•Œ ์ˆ˜ ์žˆ๋‹ค."

2) Normalizing Flow

์‹ฌ์ธต์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜ ํ™•๋ฅ ์  ์ƒ์„ฑ ๋ชจํ˜• ์ค‘ ํ•˜๋‚˜. ์ž ์žฌ ๋ณ€์ˆ˜(Z) ๊ธฐ๋ฐ˜ ํ™•๋ฅ ์  ์ƒ์„ฑ ๋ชจํ˜•์œผ๋กœ์„œ, ์ž ์žฌ ๋ณ€์ˆ˜(Z) ํš๋“์— '๋ณ€์ˆ˜ ๋ณ€ํ™˜' ๊ณต์‹์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

2. Probabilistic Generative Model: Latent variable model

1) Overview of Generative Models

  • ๋ฐ˜๋ณต์ ์ธ ๋ณ€ํ™”(iterative transformation)๋ฅผ ํ™œ์šฉํ•œ๋‹ค๋Š” ์ ์—์„œ Flow-based models์™€ ์œ ์‚ฌ
  • ๋ถ„ํฌ์— ๋Œ€ํ•œ ๋ณ€๋ถ„์  ์ถ”๋ก (Variational Inference)์„ ํ†ตํ•œ ํ•™์Šต์„ ์ง„ํ–‰ํ•œ๋‹ค๋Š” ์ ์€ VAE์™€ ์œ ์‚ฌ
  • ์ตœ๊ทผ์—๋Š” Diffusion ๋ชจ๋ธ์˜ ํ•™์Šต์— Adversarial Training์„ ํ™œ์šฉํ•˜๊ธฐ๋„ ํ•จ (Diffusion-GAN, 2022)

2) Generative model: Latent variable model

๊ฒฐ๊ตญ ์ƒ์„ฑ ๋ชจ๋ธ๋กœ๋ถ€ํ„ฐ ์›ํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ๊ฐ„๋‹จํ•œ ๋ถ„ํฌ(Z)๋ฅผ ํŠน์ •ํ•œ ํŒจํ„ด์„ ๊ฐ–๋Š” ๋ถ„ํฌ๋กœ ๋ณ€ํ™˜(Mapping, transformation, sampling)ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ธฐ์— ๋Œ€๋ถ€๋ถ„์˜ ์ƒ์„ฑ๋ชจ๋ธ์ด ์ฃผ์–ด์ง„ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ latent variable(Z)์„ ์–ป์–ด๋‚ด๊ณ , ์ด๋ฅผ ๋ณ€ํ™˜ํ•˜๋Š” ์—ญ๋Ÿ‰์„ ํ•™์Šตํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

3) Variational Auto Encoder

  • ํ•™์Šต๋œ Decoder network๋ฅผ ํ†ตํ•ด latent variable์„ ํŠน์ •ํ•œ ํŒจํ„ด์˜ ๋ถ„ํฌ๋กœ mapping
  • Encoder๋ฅผ ๋ชจ๋ธ ๊ตฌ์กฐ์— ์ถ”๊ฐ€ํ•ด, Latent variable / Encoder / Decoder๋ฅผ ๋ชจ๋‘ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.

3) Generative Adversarial Network (GAN) 

  • ํ•™์Šต๋œ Generator๋ฅผ ํ†ตํ•ด latent variable์„ ํŠน์ •ํ•œ ํŒจํ„ด์˜ ๋ถ„ํฌ๋กœ mapping
  • Discriminator๋ฅผ ๋ชจ๋ธ ๊ตฌ์กฐ์— ์ถ”๊ฐ€ํ•ด, Generator๋ฅผ ํ•™์Šต์‹œํ‚ด

4) Flow-based Model

๊ธฐ๋ณธ์ ์œผ๋กœ ๊ฐ„๋‹จํ•˜๊ณ  tractableํ•œ prior ๋ถ„ํฌ๋ฅผ ๋ณต์žกํ•œ ๋ถ„ํฌ๋กœ ๋ณ€ํ™”์‹œ๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ํ•™์Šตํ•œ Invertible Function์˜ Inverse mapping์„ ์ด์šฉํ•˜๋ฉฐ, ์ด๋Ÿฌํ•œ function์„ flow๋ผ ํ•˜์—ฌ ์ƒ์„ฑ์— ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

  • ํ•™์Šต๋œ Flow model์˜ Inverse mapping์„ ํ†ตํ•ด latent variable์„ ํŠน์ •ํ•œ ํŒจํ„ด์˜ ๋ถ„ํฌ๋กœ mapping
  • ์ƒ์„ฑ์— ํ™œ์šฉ๋˜๋Š” Inverse mapping์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด Invertible Function์„ ํ•™์Šต

5) Diffusion based generative model

Diffusion ๋ชจ๋ธ๋„ ๊ธฐ๋ณธ์ ์œผ๋กœ ๊ฐ„๋‹จํ•˜๊ณ  tractableํ•œ prior ๋ถ„ํฌ๋ฅผ ๋ณต์žกํ•œ ๋ถ„ํฌ๋กœ ๋ณ€ํ™”์‹œ๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. 

  • ํ•™์Šต๋œ Diffusion Model์˜ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ  ๋ถ„ํฌ P(x|z)๋ฅผ ํ†ตํ•ด ํŠน์ •ํ•œ ํŒจํ„ด์˜ ๋ถ„ํฌ ํš๋“
  • ์ƒ์„ฑ์— ํ™œ์šฉ๋˜๋Š” ์กฐ๊ฑด๋ถ€ ํ™•์œจ ๋ถ„ํฌ P(x|z)๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด Diffusion process q(z|x)๋ฅผ ํ™œ์šฉ 

์ฐธ๊ณ 

[Youtube] [Paper Review] Denoising Diffusion Probabilistic Models: https://www.youtube.com/watch?v=_JQSMhqXw-4

๋ฐ˜์‘ํ˜•