[B! text-to-audio] arrowKatoのブックマーク

arrowKato id:arrowKato

text-to-audioに関するarrowKatoのブックマーク (1)

ImageBind: One Embedding Space To Bind Them All
We present ImageBind, an approach to learn a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. We show that all combinations of paired data are not necessary to train such a joint embedding, and only image-paired data is sufficient to bind the modalities together. ImageBind can leverage recent large scale vision-language models, and extends their
arrowKato 2025/04/29
テキスト、画像、音声、深度、熱、慣性計測装置で異なるモーダルの距離を測ることができる。demo: https://imagebind.metademolab.com/demo をみると何ができるのかがわかりやすい。コサイン距離の値が大きいほど類似している。

image-to-text

audio-to-image

text-to-image

text-to-audio

audio&image-to-image

audio-to-generated_image
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx