Back to Glossary
Multimodal AI
マルチモーダルAI(マルチモーダルエーアイ)
IntermediateModels & Architecture
AI systems that can understand and generate multiple types of data — text, images, audio, and video — simultaneously.
Why It Matters
Multimodal AI mirrors how humans perceive the world — through multiple senses — making AI more versatile.
Example in Practice
GPT-4V analyzing a photo of a math problem and solving it, or describing what's in an image.