高级检索
牛凯, 王鹏. 视觉-语言导航的研究进展与发展趋势[J]. 计算机辅助设计与图形学学报, 2022, 34(12): 1815-1827. DOI: 10.3724/SP.J.1089.2022.19249
引用本文: 牛凯, 王鹏. 视觉-语言导航的研究进展与发展趋势[J]. 计算机辅助设计与图形学学报, 2022, 34(12): 1815-1827. DOI: 10.3724/SP.J.1089.2022.19249
NIU Kai, WANG Peng. Survey on the Research Progress and Development Trend of Vision-and-Language Navigation[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(12): 1815-1827. DOI: 10.3724/SP.J.1089.2022.19249
Citation: NIU Kai, WANG Peng. Survey on the Research Progress and Development Trend of Vision-and-Language Navigation[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(12): 1815-1827. DOI: 10.3724/SP.J.1089.2022.19249

视觉-语言导航的研究进展与发展趋势

Survey on the Research Progress and Development Trend of Vision-and-Language Navigation

  • 摘要: 视觉-语言导航是近年来出现并蓬勃发展的新兴研究方向,是视觉-语言交互前沿领域中的代表性研究任务之一,其目标是根据人类给出的语言指令基于环境视觉感知实现自主导航.首先介绍该任务的研究内容,分析其面临的跨模态语义对齐、语义理解与推理和模型泛化能力增强3个方面的问题与挑战,然后列举了常用的数据集和评价指标;再从模仿学习、强化学习、自监督学习以及其他方法 4个方面对该任务的研究进展进行归纳与总结,并对代表性方法的效果进行对比分析;从连续环境导航和高级复杂指令理解与常识推理2个方面论述该任务当前研究的热点趋势;最后对三维空间的视觉-语言导航、模糊导航、环境交互导航等未来发展方向进行讨论与展望.

     

    Abstract: Vision-and-language navigation is a newly emerging research topic developing rapidly in recent years, and it is one of the representative research tasks in the frontier field of vision-language interaction.The goal of this task is to realize autonomous navigation based on visual perception of environment according to language instructions given by human. This paper reviews the recent progress in vision-and-language navigation. Firstly, the research content of this task is introduced, and the three main problems and challenges of cross-modal semantic alignments, semantic understanding and reasoning, and generalization ability enhancement are analyzed. Secondly, commonly-used datasets and evaluation metrics are listed. Thirdly, the research progress of this task is summarized from four aspects of imitation learning, reinforcement learning,self-supervised learning and other methods, and the effects of the typical solutions are carefully compared and analyzed. Fourthly, the current research trends of this task are discussed, which mainly include continuous environment navigation, advanced complex instruction comprehension and common sense reasoning.Finally, the future development directions such as 3D visual-and-language navigation, embodied question answering and interactive question answering are further discussed and prospected.

     

/

返回文章
返回