Abstract:
To better model the association between different content information of recipes, we propose a cross-modal multi-view self-supervised heterogeneous graph network for personalized food recommendation. Firstly, we incorporate users, recipes and ingredients into a heterogeneous graph and model the complex hierarchical relationship between them based on message passing. Secondly, to better model the association between multi-modal information of food and promote the interaction between different modal information, we utilize the association of three kinds of food information, recipe nodes, ingredient nodes and recipe images to construct the cross-modal multi-view self-supervised learning task. Thirdly, the multi-modal features of recipes are integrated by the attention module guided by the user representation to obtain the comprehensive recipe representation. Finally, the food recommendation task is completed by measuring the similarity of the user representation and the comprehensive recipe representation. Experimental results on a large-scale food recommendation dataset show that the proposed method outperforms the optimal baseline method HAFR over AUC, NDCG@10 and Recall@10 by 6.35%, 8.13% and 11.7%, respectively, which verifies the effectiveness of our method.