Abstract:
The technique of gaze prediction has significant application value in various fields, including human-computer interaction, medical diagnosis, advertising research, and game design. However, current methods for predicting gaze in virtual scenes typically rely on generalized models and still have considerable room for improvement in specific interactive tasks. This paper focuses on improving gaze prediction for the common interactive pattern of finding, approaching, and touching objects in virtual scenes. We first construct the first dataset for this task, consisting of gaze recordings, object, helmet and controller parameters, as well as recorded videos, during five interacting tasks performed by 21 users in three interactive scenes. The users' completion of the task is divided into three stages: (1) finding the target object; (2) locking onto the target object; and (3) approaching the target object. We then conduct Spearman correlation analysis at each stage, selecting the parameter set with the highest correlation with gaze to input into the network for training. The proposed method is validated on the constructed dataset, achieving a gaze prediction error of 2.60°, which represents a 21.45% improvement over the current best method's error of 3.31°, significantly enhancing gaze prediction accuracy for this task.