From the course: Multimodal AI Essentials: Merging Text, Image, and Audio for Next-Generation AI Applications
Unlock this course with a free trial
Join today to access over 24,900 courses taught by industry experts.
Future trends and innovations in multimodal AI
From the course: Multimodal AI Essentials: Merging Text, Image, and Audio for Next-Generation AI Applications
Future trends and innovations in multimodal AI
- So breaking down future trends is always going to be a bit of a peering into a crystal ball. I may be right, I may be wrong, but to kind of give myself some credit, let's try to look at future trends at three main categories. Number one, models/LLMS, LMMs, multimodal applications in general, will get more capable and unlock higher level tasks. So for example, what is currently relatively difficult for most multimodal image systems to do is to actually read small, fine, fine-grained text on very long documents, without having to do a separate process for converting the document into text first. First, Sora, the video diffusion model from OpenAI, maybe one day won't make a double-headed horse for me when I ask it to draw a horse galloping behind me. We were always going to need newer and more advanced benchmarks to try to tackle these more capable models as they get better over time. Number two, systems will get creative in their modalities. We've already seen several examples of how…