DrawingProcess
๋“œํ”„ DrawingProcess
DrawingProcess
์ „์ฒด ๋ฐฉ๋ฌธ์ž
์˜ค๋Š˜
์–ด์ œ
ยซ   2026/04   ยป
์ผ ์›” ํ™” ์ˆ˜ ๋ชฉ ๊ธˆ ํ† 
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30
  • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (991)
    • Profile & Branding (26)
      • Career (19)
    • IT Trends (255)
      • Conference, Faire (Experien.. (31)
      • News (187)
      • Youtube (20)
      • TED (8)
      • Web Page (2)
      • IT: Etc... (6)
    • Contents (98)
      • Book (67)
      • Lecture (31)
    • Project Process (95)
      • Ideation (0)
      • Study Report (35)
      • Challenge & Award (22)
      • 1Day1Process (5)
      • Making (5)
      • KRC-FTC (Team TC(5031, 5048.. (10)
      • GCP (GlobalCitizenProject) (15)
    • Study: ComputerScience(CS) (73)
      • CS: Basic (9)
      • CS: Database(SQL) (5)
      • CS: Network (14)
      • CS: OperatingSystem (3)
      • CS: Linux (39)
      • CS: Etc... (3)
    • Study: Software(SW) (95)
      • SW: Language (29)
      • SW: Algorithms (1)
      • SW: DataStructure & DesignP.. (1)
      • SW: Opensource (15)
      • SW: Error Bug Fix (43)
      • SW: Etc... (6)
    • Study: Artificial Intellige.. (159)
      • AI: Research (2)
      • AI: 2D Vision(Det, Seg, Tra.. (40)
      • AI: 3D Vision (74)
      • AI: MultiModal (3)
      • AI: SLAM (0)
      • AI: Light Weight(LW) (3)
      • AI: Data Pipeline (7)
      • AI: Machine Learning(ML) (1)
    • Study: Robotics(Robot) (33)
      • Robot: ROS(Robot Operating .. (9)
      • Robot: Positioning (8)
      • Robot: Planning & Control (7)
    • Study: DeveloperTools(DevTo.. (83)
      • DevTool: Git (12)
      • DevTool: CMake (13)
      • DevTool: NoSQL(Elastic, Mon.. (25)
      • DevTool: Container (17)
      • DevTool: IDE (11)
      • DevTool: CloudComputing (4)
    • ์ธ์ƒ์„ ์‚ด๋ฉด์„œ (73) N
      • ๋‚˜์˜ ์ทจ๋ฏธ๋“ค (11)
      • ๋‚˜์˜ ์ƒ๊ฐ๋“ค (42)
      • ์—ฌํ–‰์„ ๋– ๋‚˜์ž~ (10)
      • ๋ถ„๊ธฐ๋ณ„ ํšŒ๊ณ  (10) N

๊ฐœ๋ฐœ์ž ๋ช…์–ธ

โ€œ ๋งค์ฃผ ๋ชฉ์š”์ผ๋งˆ๋‹ค ๋‹น์‹ ์ด ํ•ญ์ƒ ํ•˜๋˜๋Œ€๋กœ ์‹ ๋ฐœ๋ˆ์„ ๋ฌถ์œผ๋ฉด ์‹ ๋ฐœ์ด ํญ๋ฐœํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•ด๋ณด๋ผ.
์ปดํ“จํ„ฐ๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ๋Š” ์ด๋Ÿฐ ์ผ์ด ํ•ญ์ƒ ์ผ์–ด๋‚˜๋Š”๋ฐ๋„ ์•„๋ฌด๋„ ๋ถˆํ‰ํ•  ์ƒ๊ฐ์„ ์•ˆ ํ•œ๋‹ค. โ€

- Jef Raskin

๋งฅ์˜ ์•„๋ฒ„์ง€ - ์• ํ”Œ์ปดํ“จํ„ฐ์˜ ๋งคํ‚จํ† ์‹œ ํ”„๋กœ์ ํŠธ๋ฅผ ์ฃผ๋„

์ธ๊ธฐ ๊ธ€

์ตœ๊ทผ ๊ธ€

์ตœ๊ทผ ๋Œ“๊ธ€

ํ‹ฐ์Šคํ† ๋ฆฌ

hELLO ยท Designed By ์ •์ƒ์šฐ.
DrawingProcess

๋“œํ”„ DrawingProcess

Study: Artificial Intelligence(AI)/AI: 2D Vision(Det, Seg, Trac)

[2D Vision] ์—ฐ์„ธ YAI ๊ธฐ์ดˆ์‹ฌํ™”CV: Transformer & Vision Transformer

2025. 8. 21. 10:25
๋ฐ˜์‘ํ˜•
๐Ÿ’ก ๋ณธ ๋ฌธ์„œ๋Š” '[2D Vision] ์—ฐ์„ธ YAI ๊ธฐ์ดˆ์‹ฌํ™”CV: Transformer & Vision Transformer'์— ๋Œ€ํ•ด ์ •๋ฆฌํ•ด๋†“์€ ๊ธ€์ž…๋‹ˆ๋‹ค.
ViT๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ Transformer์˜ ๊ธฐ์ดˆ์™€ ๋ชจ๋“ˆ ๊ตฌ์„ฑ ๋ฐ ViT์˜ ํ•ต์‹ฌ์ ์ธ ๋ถ€๋ถ„์— ๋Œ€ํ•ด ์ •๋ฆฌํ•˜์˜€์œผ๋‹ˆ ์ฐธ๊ณ ํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

1. Transformer (Attention Is All You Need, 2017)

Background

Transformer๋Š” 2017๋…„ ๋ฐœํ‘œ๋œ “Attention Is All You Need” ๋…ผ๋ฌธ์—์„œ ์ฒ˜์Œ ์ œ์•ˆ๋œ ๋ชจ๋ธ๋กœ,

  • ๊ธฐ์กด์˜ RNN ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์ด ๊ฐ€์ง€๋Š” ์ˆœ์ฐจ ์ฒ˜๋ฆฌ ๊ตฌ์กฐ์˜ ํ•œ๊ณ„์™€ Vanishing Gradient ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. RNN์€ ์ˆœ์ฐจ์ ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ณ‘๋ ฌํ™”๊ฐ€ ์–ด๋ ต๊ณ , ์ž…๋ ฅ ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์งˆ์ˆ˜๋ก ๊ณผ๊ฑฐ ์ •๋ณด๊ฐ€ ์‚ฌ๋ผ์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
  • CNN์€ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ ์ง€์—ญ์  ํŠน์„ฑ ๋•Œ๋ฌธ์— ๋ฉ€๋ฆฌ ๋–จ์–ด์ง„ ์œ„์น˜ ๊ฐ„์˜ ์˜์กด์„ฑ ํ•™์Šต์ด ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Transformer๋Š” Attention ๋ฉ”์ปค๋‹ˆ์ฆ˜๋งŒ์„ ์ด์šฉํ•ด ์‹œํ€€์Šค ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ฉฐ, ๋†’์€ ์ •ํ™•๋„, ๋›ฐ์–ด๋‚œ ๋ณ‘๋ ฌํ™”, ๋น ๋ฅธ ํ•™์Šต ์†๋„๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

Methods

Transformer๋Š” Encoder์™€ Decoder๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ๊ฐ ๋ถ€๋ถ„์€ ๋™์ผํ•œ ๋ ˆ์ด์–ด๋ฅผ ์Œ“์€ ํ˜•ํƒœ๋กœ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. Encoder๋Š” Multi-Head Self-Attention๊ณผ Position-wise Feed-Forward ๋„คํŠธ์›Œํฌ๋ฅผ ํฌํ•จํ•˜๊ณ , Decoder๋Š” ์—ฌ๊ธฐ์— Masked Multi-Head Attention์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์ž…๋ ฅ ํ† ํฐ์€ Embedding Layer๋ฅผ ํ†ตํ•ด ๊ณ ์ •๋œ ์ฐจ์›์˜ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜๋˜๊ณ , Positional Encoding์ด ์‚ฌ์šฉํ•˜์—ฌ ์ˆœ์„œ ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์ถœ๋ ฅ์€ Softmax Linear Transformation์„ ํ†ตํ•ด ๋‹ค์Œ ํ† ํฐ์˜ ํ™•๋ฅ ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.

Transformer์˜ ํ•ต์‹ฌ ๋ชจ๋“ˆ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ 3๊ฐ€์ง€ ์ž…๋‹ˆ๋‹ค.

  • Scaled Dot-Product Attention: Query, Key, Value ๋ฒกํ„ฐ ๊ฐ„ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ์ด๋ฅผ Softmax๋ฅผ ํ†ตํ•ด ๊ฐ€์ค‘ํ•ฉํ•˜์—ฌ Context-awareํ•œ ๋ฒกํ„ฐ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • Multi-Head Attention: ์—ฌ๋Ÿฌ ๊ฐœ์˜ Attention Head๋ฅผ ๋ณ‘๋ ฌ๋กœ ํ•™์Šต์‹œ์ผœ ์„œ๋กœ ๋‹ค๋ฅธ ํ‘œํ˜„ ๊ณต๊ฐ„์—์„œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋ฉฐ, ์ด๋ฅผ ๊ฒฐํ•ฉํ•ด ๋” ํ’๋ถ€ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  • Self-Attention: ๋™์ผํ•œ ์‹œํ€€์Šค ๋‚ด์—์„œ ๋ฌธ๋งฅ์„ ๋ฐ˜์˜ํ•˜๋„๋ก ํ•˜๊ณ , Encoder-Decoder Attention์€ Decoder๊ฐ€ Encoder๋กœ๋ถ€ํ„ฐ ์ •๋ณด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋ฏธ๋ž˜ ์ •๋ณด๋ฅผ ์ฐธ์กฐํ•˜์ง€ ์•Š๋„๋ก Decoder์—์„œ๋Š” Masked Attention์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

2. Vision Transformer: An Image is Worth 16x16 Words (ViT, 2021)

Vision Transformer(ViT)๋Š” ์ด๋ฏธ์ง€๋ฅผ ์ผ์ • ํฌ๊ธฐ์˜ ํŒจ์น˜๋กœ ๋ถ„ํ• ํ•˜๊ณ , ๊ฐ ํŒจ์น˜๋ฅผ ํ•˜๋‚˜์˜ ํ† ํฐ์œผ๋กœ ๊ฐ„์ฃผํ•˜์—ฌ Transformer์— ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ ํŒจ์น˜๋Š” ์„ ํ˜• ๋ณ€ํ™˜์„ ํ†ตํ•ด ์ž„๋ฒ ๋”ฉ๋˜๊ณ , ์ˆœ์„œ ์ •๋ณด๋ฅผ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด Positional Encoding์ด ๋”ํ•ด์ง‘๋‹ˆ๋‹ค. ์ดํ›„ Transformer Encoder ๋ธ”๋ก์„ ํ†ตํ•ด ํ•™์Šต์ด ์ด๋ฃจ์–ด์ง€๋ฉฐ, ๊ฐ ๋ธ”๋ก์€ Multi-Head Self-Attention๊ณผ MLP, Residual Connection, Layer Normalization์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

ViT์˜ ๊ฐ€์žฅ ํฐ ์žฅ์ ์€ CNN๊ณผ ๋‹ฌ๋ฆฌ ์ „์—ญ์ ์ธ ๋ฌธ๋งฅ ์ •๋ณด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์™€ ์—ฐ์‚ฐ ์ž์›์ด ํ•„์š”ํ•˜๋‹ค๋Š” ๋‹จ์ ๋„ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. CNN์€ ์ง€์—ญ ํŒจํ„ด์— ๊ฐ•ํ•˜์ง€๋งŒ ์žฅ๊ฑฐ๋ฆฌ ์˜์กด์„ฑ ํ•™์Šต์ด ์–ด๋ ค์šด ๋ฐ˜๋ฉด, ViT๋Š” ์žฅ๊ฑฐ๋ฆฌ ์˜์กด์„ฑ์„ ์‰ฝ๊ฒŒ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์–ด ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์—์„œ ํŠนํžˆ ๊ฐ•๋ ฅํ•œ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•ฉ๋‹ˆ๋‹ค.

Discussion

  • NLP์—์„œ๋Š” ๋‹จ์–ด๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๋ณด๋Š”๋ฐ, ViT์—์„œ๋Š” Patch ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๋ณด๋Š” ๊ฒƒ์ธ๊ฐ€?
    • ๋งž๋‹ค.
  • Positional Embedding์„ ์™œ Learnableํ•˜๊ฒŒ ๊ตฌ์„ฑํ•˜์ง€?
    • ๋‹จ์ˆœํ•œ patch์˜ ์œ„์น˜์ •๋ณด ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ Patch๋“ค ๊ฐ„์˜ ์œ ์‚ฌ๋„๋„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด์„œ
  • Transformer ๊ตฌ์กฐ๊ฐ€ ์ด๋ฏธ์ง€๋ž‘์€ ์–ด์šธ๋ฆฌ์ง€ ์•Š๋Š” ๋ชจ์Šต์„ ๋ณด์ผ ์ˆ˜ ์žˆ์„ ๊ฑฐ ๊ฐ™๋‹ค. ์ด๋ฅผ ์–ด๋–ป๊ฒŒ ํ•ด๊ฒฐํ•ด๋‚˜๊ฐ€๊ณ  ์žˆ๋Š”๊ฐ€?
    • ์ผ๋ฐ˜ํ™”๊ฐ€ ๋” ์ข‹์•„์งˆ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์•„๋ณด์ด๊ธฐ๋„ ํ•˜๋‹ค!
  • transformer ์—์„œ decoder๋ฅผ vit์— ๊ฒฐํ•ฉ์‹œํ‚ค๋ฉด ์ด๋ฏธ์ง€ ์ƒ์„ฑ๋„ ๊ฐ€๋Šฅํ•œ๊ฐ€?
    • ๊ฐ€๋Šฅํ•˜๋‹ค. masked_vit model์„ ํ†ตํ•ด์„œ mask ๋˜์ง€ ์•Š์€ ๋ถ€๋ถ„์„ ํ•™์Šต์‹œํ‚ค๊ณ  ์ด๋ฅผ ์ถ”๊ฐ€๋กœ
  • class ํ† ํฐ์ด ์–ด๋–ป๊ฒŒ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ์ง€?
    • class ํ† ํฐ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ชจ๋“  ํ† ํฐ์ด ๋ชจ๋“  ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ์„ ๊ฒƒ. ํ•˜์ง€๋งŒ ์ฟผ๋ฆฌํ•˜๋Š” ๋ถ€๋ถ„์—์„œ…?
    • class ํ† ํฐ์€ random ํ•˜๊ฒŒ ์‹œ์ž‘๋˜์–ด์„œ ์ฃผ๋ณ€ ์ •๋ณด๋ฅผ localํ•˜๊ฒŒ ๋ณด์ง€ ์•Š๊ณ , ์ „์ฒด์ ์ธ ํ‰๊ท 
๋ฐ˜์‘ํ˜•
์ €์ž‘์žํ‘œ์‹œ ๋น„์˜๋ฆฌ ๋ณ€๊ฒฝ๊ธˆ์ง€ (์ƒˆ์ฐฝ์—ด๋ฆผ)

'Study: Artificial Intelligence(AI) > AI: 2D Vision(Det, Seg, Trac)' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[2D Vision] ์—ฐ์„ธ YAI ๊ธฐ์ดˆ์‹ฌํ™”CV: Generative Models  (0) 2025.08.19
[Survey] Semantic 3D Reconstruction ๊ด€๋ จ ๋‚ด์šฉ ์ •๋ฆฌ  (0) 2025.08.13
[2D Vision] ์—ฐ์„ธ YAI ๊ธฐ์ดˆ์‹ฌํ™”CV: YOLO  (6) 2025.08.12
[2D Vision] ์—ฐ์„ธ YAI ๊ธฐ์ดˆ์‹ฌํ™”CV: R-CNN, Faster R-CNN  (4) 2025.08.04
[2D Vision] 2D Point Tracking: co-tracker ์‚ฌ์šฉ๋ฒ•  (0) 2025.04.24
    'Study: Artificial Intelligence(AI)/AI: 2D Vision(Det, Seg, Trac)' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€
    • [2D Vision] ์—ฐ์„ธ YAI ๊ธฐ์ดˆ์‹ฌํ™”CV: Generative Models
    • [Survey] Semantic 3D Reconstruction ๊ด€๋ จ ๋‚ด์šฉ ์ •๋ฆฌ
    • [2D Vision] ์—ฐ์„ธ YAI ๊ธฐ์ดˆ์‹ฌํ™”CV: YOLO
    • [2D Vision] ์—ฐ์„ธ YAI ๊ธฐ์ดˆ์‹ฌํ™”CV: R-CNN, Faster R-CNN
    DrawingProcess
    DrawingProcess
    ๊ณผ์ •์„ ๊ทธ๋ฆฌ์ž!

    ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”