LinkedIn respects your privacy

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Join now Sign in

From the course: Multimodal AI Essentials: Merging Text, Image, and Audio for Next-Generation AI Applications

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Future trends and innovations in multimodal AI

Future trends and innovations in multimodal AI

From the course: Multimodal AI Essentials: Merging Text, Image, and Audio for Next-Generation AI Applications

Start my 1-month free trial Buy for my team

Future trends and innovations in multimodal AI

“

- So breaking down future trends is always going to be a bit of a peering into a crystal ball. I may be right, I may be wrong, but to kind of give myself some credit, let's try to look at future trends at three main categories. Number one, models/LLMS, LMMs, multimodal applications in general, will get more capable and unlock higher level tasks. So for example, what is currently relatively difficult for most multimodal image systems to do is to actually read small, fine, fine-grained text on very long documents, without having to do a separate process for converting the document into text first. First, Sora, the video diffusion model from OpenAI, maybe one day won't make a double-headed horse for me when I ask it to draw a horse galloping behind me. We were always going to need newer and more advanced benchmarks to try to tackle these more capable models as they get better over time. Number two, systems will get creative in their modalities. We've already seen several examples of how…

Contents