Abstract:
                                      Breast tissue pathology image classification, based on self-supervised learning, can assist pathologists in screening breast cancer patients. Current self-supervised learning methods learn feature representations of images by constructing auxiliary tasks. However, the features extracted in this way tend to solve auxiliary tasks, making it difficult to mine the characteristic information of the pathological image itself, thereby affecting the model's performance in downstream tasks. To address this issue, this article proposes a Transformer-based self-supervised classification method for breast tissue pathology images. The DenseSwinNet feature extraction network was designed to leverage the convolutional neural network and visual Transformer's ability to perceive local and global information of pathological images. Simultaneously, a classifier based on clustering and self-supervision was constructed to aggregate the local and global features of breast tissue pathology images and predict their potential cancerous status. On the publicly available Camelyon16 dataset for breast tissue pathology image classification tasks, the proposed method achieved an accuracy of 0.9016, an F1-Score of 0.857, and an AUC of 0.9247. Experimental results demonstrate the effectiveness of the proposed method in improving classification performance. Additionally, visual analysis of the model's focus area substantiates its interpretability.