【文章推薦】Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers

原文：Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers

Pixel BERT:Aligning ImagePixels with Text by Deep Multi Modal Transformers : : Paper:https: arxiv.org pdf . 預訓練模型如火如荼，多模態預訓練也沒有掉隊。其中，image language 領域，如何將這兩者聯合學習是一個較為火熱的方向。本文提出一種基於跨模態 transformer 的模型， ...

2020-12-23 11:55 0 591 推薦指數：

查看詳情

Paper Read: Robust Deep Multi-modal Learning Based on Gated Information Fusion Network

Robust Deep Multi-modal Learning Based on Gated Information Fusion Network 2018-07-27 14:25:26 Paper：https://arxiv.org/pdf/1807.06233.pdf ...

Multi-modal Knowledge Graphs for Recommender Systems - 1 - 論文學習

Multi-modal Knowledge Graphs for Recommender Systems ABSTRACT 在各種在線應用中，推薦系統在解決信息爆炸問題和增強用戶體驗方面顯示出了巨大的潛力 ...

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

摘要：提出了一個新的語言表示模型(language representation), BERT: Bidirectional Encoder Representations from Transformers。不同於以往提出的語言表示模型，它在每一層的每個位置都能利用其左右兩側的信息用於學習 ...

文獻閱讀報告 - Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs

文獻引用 Amirian J, Hayet J B, Pettre J. Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs[J]. 2019. 文章是繼 ...

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 摘要我們引入了一個新的叫做bert的語言表示模型，它用transformer的雙向編碼器表示。與最近的語言表示模型不同，BERT ...

MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video - 1 - 論文學習

MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video ABSTRACT ...

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition--論文

文章題目：多模態域自適應的細粒度動作識別 1、引言　　首先明確幾個名詞含義。　　Multi-Modal（多模態）：每種信息來源可以稱作一個模態，多模態就是同時處理兩種或兩種以上的信息來源。例如一個視頻有視覺、聽覺、字幕等，同時考慮視覺，聽覺就是多模態方法。　　Domain ...

Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data

目錄 Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data 一. 論文簡介二. 模塊詳解 2.1 DetNet 2.2 ...

原文：Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers

相關推薦

相關標簽