高级检索

面向开源软件项目的可视检索系统

Visual Retrieval System for Open Source Software Projects

  • 摘要: 开源软件生态的快速发展促使开源软件项目规模持续扩张,但用户难以从开源平台返回的大量检索结果中高效定位所需的项目。为了提高用户检索效率,设计并实现了面向开源软件项目的可视检索系统VisRepo。首先,基于大规模Github开源项目数据,从语义相关性、技术相关性、易用性和可理解性4个方面补全项目信息,全面揭示项目特征;其次,基于大语言模型对项目语义信息进行层次主题建模,结合语义向量与关键词混合检索策略计算项目之间的相似度,实现相似项目推荐;最后,基于项目特征和主题建模,设计多视图协同可视分析与交互手段,构建搜索—探索—检查—推荐的项目检索流程,支持用户对检索结果进行多维度探索。通过案例分析和用户研究对VisRepo进行评估,结果表明VisRepo能够有效辅助用户从大量开源项目中快速定位感兴趣的项目,验证了VisRepo的有效性和实用性。

     

    Abstract: The rapid expansion of open source software ecosystems makes it difficult for users to efficiently locate desired projects from massive search results. To improve retrieval efficiency, we designed and implemented VisRepo, a visual retrieval system for open source software projects. First, we supplement project information based on large-scale GitHub data from four aspects: semantic relevance, technical relevance, usability, and understandability, comprehensively revealing project characteristics. Then, our method employs large language models for hierarchical topic modeling on project semantics and combines semantic vector retrieval with keyword matching to compute inter-project similarity for similar project recommendation. Finally, we design coordinated multi-view visualizations and interactive techniques based on project features and topic models, establishing a search—explore—inspect—recommend workflow for multi-dimensional exploration. Case studies and user studies demonstrate that VisRepo effectively helps users quickly locate projects of interest, validating its effectiveness and practicality.

     

/

返回文章
返回