高级检索

多层视频语义概念分析与理解

Analysis and Understanding for Multi-Level Video Semantic Concepts

  • 摘要: 基于统计学理论,提出了一种视频多粒度语义分析的通用方法,使得多层次语义分析与多模式信息融合得到统一.为了对时域内容进行表示,首先提出一种具有时间语义语境约束的关键帧选取策略和注意力选择模型;在基本视觉语义识别后,采用一种多层视觉语义分析框架来抽取视觉语义;然后应用隐马尔可夫模型(HMM)和贝叶斯决策进行音频语义理解;最后用一种具有两层结构的仿生多模式融合方案进行语义信息融合.实验结果表明,该方法能有效融合多模式特征,并提取不同粒度的视频语义.

     

    Abstract: Based on statistics theory, a generic method for video multi-granularity semantic analysis is proposed in this paper, where multi-level semantics analysis and multi-modal information fusion are unified to represent temporal content, a key-frame selection strategy with temporal semantic context restriction and an attention selection model are presented firstly.After recognizing basic visual semantics, a framework for multi-level visual semantics analysis is introduced for visual semantics extraction.Then, Hidden Markov model and Bayesian decision are applied to audio semantic understanding.Finally, a bionic multimodal fusion scheme with two level structures is used for video semantic information fusion.Experimental results demonstrate the effectiveness of the proposed method to fuse multimodal features, as well as to extract video semantics with different granularity.

     

/

返回文章
返回