Abstract:
By retrieving patient data from multiple modalities such as images and text, physicians can analyze the patient’s disease progression more comprehensively and deeply, thereby making more precise clinical diagnoses and treatments. Given the rapid growth of medical data and the challenges of low storage requirements and efficient retrieval, hashing methods have gradually become a popular paradigm in the field of cross-modal patient retrieval. However, existing cross-modal hashing methods still have deficiencies in utilizing the multi-label information of patient diagnostic categories and achieving collaborative modeling between modalities. Therefore, we propose a method named graph enhanced hashing with cross-modal attention, which is used for cross-modal patient retrieval. Firstly, a graph information enhanced label encoder is designed, which constructs a label graph using the co-occurrence relationships of diagnostic category labels and effectively extracts multi-label embedding representations of patients through a graph convolutional network. Secondly, to alleviate the sparsity of the multi-label matrix and enrich the semantic information of the fused representations, the model introduces a cross-modal attention fusion mechanism between image, text, and multi-label embeddings. Finally, to effectively integrate the multi-label co-occurrence relationships and semantic information of diagnostic categories into the hash codes of patient image and text data, and further enhance the discriminative power of the hash codes, a similarity loss function that utilizes intra-modal and inter-modal associations is designed. Experiments on the MIMIC-CXR dataset of chest X-ray images and diagnostic reports demonstrate that, compared with three classical cross-modal hashing methods and four advanced deep cross-modal hashing methods, GEHCA exhibits performance advantages under various hash code lengths. On the task of retrieving diagnostic reports using X-ray images, the proposed model achieves mean average precision improvements of 2.98%, 1.21%, 0.63%, and 0.53% compared to the second-best method, at hash code lengths of 16, 32, 64, and 128 bits, respectively. On the task of retrieving X-ray images using diagnostic reports, the proposed model achieves mean average precision improvements of 0.91%, 0.75%, 1.03%, and 0.57% compared to the second-best method at the above 4 hash code lengths.