Human-like Controllable Image Captioning with Verb-specific Semantic Roles(具有動詞語義角色的類人可控圖像字幕生成)

本文轉載自查看原文 2021-04-09 00:21 229 論文筆記

前人的缺陷：

CIC works mainly focus on (1)subjective control signals,(2)objective control signals or (1) Content-controlled (2) Structure controlled。

almost all existing objective control signals have overlooked two indispensable characteristics of an ideal control signal:

1) Event-compatible:all visual contents referred to in a single sentence should be compatible with the describe activity.

2) Sample-suitable: the control signals should be suitable for a specific image sample.

論文的創新點：

propose a new event-oriented objective control signal, Verb-specific Semantic Roles (VSR), to meet both event-compatible and sample-suitable requirements simultaneously。

VSR consists of a verb and some user-interested semantic roles。

Grounded Semantic Role Labeling: visual features of all grounded proposal sets。

Semantic Structure Planner: hierarchical semantic structure learning model, which aims to learn a reasonable sequence of sub-roles S。

Verb-specific Semantic Roles = Grounded Semantic Role Labeling υ Semantic Structure Planner

step：we first use GSRL and SSP to obtain semantic structures and grounded regions features: (Sa; Ra) and (Sb; Rb).

Then,as shown in Figure above, we merge them by two steps。

(a) find the sub-roles in both Sa and Sb which refer to the same visual regions

(b) insert all other sub-roles between the nearest two selected sub-roles

模型架構:

Faster R-CNN(ResNet-101) + Controllable LSTM + Controllable UpDn + SCT

原文: https://arxiv.org/abs/2103.12204

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 動詞（verb）第七講_圖像描述（圖說）Image Captioning be動詞系動詞連綴動詞 Linking Verb 圖像理解（Image Captioning）（1）CNN部分 Image Captioning代碼復現 semantic issue 語義問題【Unity】10.4 類人動畫角色的控制【CV論文閱讀】Image Captioning 總結【Unity】10.3 創建類人動畫角色圖像檢索（image retrieval）- 10 - Fine-tuning CNN Image Retrieval with No Human Annotation - 1 - 論文學習