Abstract:
The rapid expansion of open source software ecosystems makes it difficult for users to efficiently locate desired projects from massive search results. To improve retrieval efficiency, we designed and implemented VisRepo, a visual retrieval system for open source software projects. First, we supplement project information based on large-scale GitHub data from four aspects: semantic relevance, technical relevance, usability, and understandability, comprehensively revealing project characteristics. Then, our method employs large language models for hierarchical topic modeling on project semantics and combines semantic vector retrieval with keyword matching to compute inter-project similarity for similar project recommendation. Finally, we design coordinated multi-view visualizations and interactive techniques based on project features and topic models, establishing a search—explore—inspect—recommend workflow for multi-dimensional exploration. Case studies and user studies demonstrate that VisRepo effectively helps users quickly locate projects of interest, validating its effectiveness and practicality.