Abstract:
Scatterplots are a common visualization analysis tool. With the increase in data scale, data overlap becomes a key issue affecting users’ ability to extract insights from scatterplots. To address the problems of data information loss and reduced visualization clarity that existing overlap removal algorithms encounter when handling large-scale high-density datasets, we propose a scatterplot overlap removal method that integrates data aggregation and force-repulsion—DAFR. First, the space is divided into a regular grid, and points to be aggregated are determined based on the number of grids, data scale, point categories, and point distribution. Then, the points to be aggregated are merged with their nearest neighbors of the same type, and visual unit sizes are assigned according to the aggregation amount. Finally, the positions of the data points are iteratively adjusted according to the force-repulsion principle and displacement constraints until the overlap rate falls below the overlap threshold. Experimental results on 29 datasets with varying scales and distributions show that the DAFR method performs excellently in five objective metrics: displacement minimization, K-nearest neighborhood maintenance, shape preservation, density preservation, and overall similarity. Case studies and subjective experiments further confirm the method’s practicality and effectiveness in visual analysis tasks.