高级检索

FG-ECVG:细粒度情感可控的视频生成算法

FG-ECVG: Fine-Grained Emotion-Controllable Video Generation Algorithm

  • 摘要: 情感引导的多媒体内容生成是推动可控人工智能内容生成技术发展的重要一环, 对于丰富公众表达情绪和观点的方式具有独特价值. 针对大模型生成的视觉内容情感属性模糊、交互性弱的问题, 提出一种基于文本指令优化的视频生成算法FG-ECVG, 可以实现文本指令到视频内容的高情感可控、强交互性自动化生成. 首先基于效价-唤醒-控制情感模型构建一个引导词典, 并基于该词典对输入文本做情感极性分析和情感引导词匹配, 实现整体视觉氛围的情感控制; 然后基于检索—增强—生成算法构建一个视觉细节扩写框架, 为用户输入的文本指令添加结构化的类人情感视觉元素, 提升生成内容的情感颗粒度. 在EmoSet数据集上5类场景类别进行情感6分类内容生成, 并对主观及客观微视频评价进行实验的结果表明, 与仅使用生成式视觉大模型相比, 所提算法生成的视频内容具有更强的情感表现力, 情感2分类和情感6分类准确率分别提升23.33和20.00个百分点; 与目前较新的视觉情感迁移或生成算法相比, 情感6分类准确率平均提升26.67个百分点, 证明了该算法的有效性和优越性.

     

    Abstract: Emotionally guided multimedia content generation is dedicated to enriching the public’s means of expressing emotions and viewpoints, becoming a significant component in propelling the development of controllable artificial intelligence generated content technology. To address the issues of ambiguous emotional attributes and weak interactivity in visual content generated by large models, this work proposes an optimized video generation algorithm FG-ECVG based on text commands, capable of highly controllable and strongly interactive automated generation from text instructions to video content. First, a guidance dictionary is constructed based on the valence-arousal-dominance emotional model, and the input text is analyzed for emotional polarity and matched with emotionally guiding words, achieving emotional control of the overall visual atmosphere. Second, a visual detail expansion framework is built using the retrieval-augmented-generation algorithm, adding structured anthropomorphic emotional visual elements to the user’s text commands, enhancing the emotional granularity of the generated content. By conducting a six-category emotional content generation for five types of scenes in the EmoSet dataset, followed by subjective and objective micro-video evaluations, the results indicate that compared to using generative visual large models alone, the method proposed in this paper yields video content with stronger emotional expressiveness. The accuracy of emotional 2-category and 6-category classifications increased by 23.33 and 20.00 percentage points respectively. Compared to current state-of-the-art visual emotion transfer or generation algorithms, the accuracy of emotional 6-category classification increased by an average of 26.67 percentage points, proving the effectiveness and superiority of the algorithm.

     

/

返回文章
返回