高级检索
杨维铠, 陈长建, 朱江宁, 李磊, 刘鹏, 刘世霞. 基于可视分析的训练数据质量提升综述[J]. 计算机辅助设计与图形学学报, 2023, 35(11): 1629-1642. DOI: 10.3724/SP.J.1089.2023.2023-00321
引用本文: 杨维铠, 陈长建, 朱江宁, 李磊, 刘鹏, 刘世霞. 基于可视分析的训练数据质量提升综述[J]. 计算机辅助设计与图形学学报, 2023, 35(11): 1629-1642. DOI: 10.3724/SP.J.1089.2023.2023-00321
Yang Weikai, Chen Changjian, Zhu Jiangning, Li Lei, Liu Peng, Liu Shixia. A Survey of Visual Analytics Research for Improving Training Data Quality[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(11): 1629-1642. DOI: 10.3724/SP.J.1089.2023.2023-00321
Citation: Yang Weikai, Chen Changjian, Zhu Jiangning, Li Lei, Liu Peng, Liu Shixia. A Survey of Visual Analytics Research for Improving Training Data Quality[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(11): 1629-1642. DOI: 10.3724/SP.J.1089.2023.2023-00321

基于可视分析的训练数据质量提升综述

A Survey of Visual Analytics Research for Improving Training Data Quality

  • 摘要: 在机器学习应用中,由于数据来源渠道多以及部分标注者水平不足,训练数据质量很难得到保证.通过深度结合机器学习和可视化技术,可视分析技术将人融入数据质量分析与提升回路中,帮助提升训练数据质量,从而提高模型性能.文中首先总结了训练数据质量问题的三大类型:标注错,覆盖窄,标注缺;然后基于这些问题类型,介绍分析了相关的可视分析工作,包括标注错误修正方法,数据集偏离纠正方法和无标注数据质量提升方法;最后深入分析了基于可视分析的训练数据质量提升面临的机遇与挑战,包括在复杂任务、大语言模型、多模态数据、流数据等场景下的数据质量提升.

     

    Abstract: In the applications of machine learning, it is difficult to ensure the quality of training data due to the various sources of training data and the inexperience of some annotators. By tightly integrating machine learning and visualization, visual analytics techniques involve humans in the loop of data quality analysis and improvement, thereby enhancing the quality of training data and improving model performance. In this survey, we first summarize the main types of training data quality issues, including inaccurate annotations, low coverage, and insufficient annotations. Based on the identified problem types, we categorize and summarize relevant visual analytics approaches, including methods for correcting inaccurate annotations, reducing dataset biases, and enhancing the quality of unlabeled data. Finally, we delve into the opportunities and challenges faced in research on training data quality improvement using visual analytics. This includes enhancing data quality in scenarios such as complex tasks, large language models, multimodal data, and streaming data.

     

/

返回文章
返回