高级检索

面向患者跨模态检索的图增强哈希网络模型

Graph Enhanced Hashing Networks for Cross-modal Patient Retrieval

  • 摘要: 通过对图像、文本等多种模态的患者数据进行检索, 医生能够更全面、深入地分析患者的病程状态, 做出更精确的临床诊断和治疗决策. 随着医疗数据规模的迅猛增长、低存储需求与高效率检索的挑战, 哈希方法逐渐在患者数据跨模态检索领域成为流行范式. 针对现有跨模态哈希方法在利用患者诊断类别的多标签信息以及实现模态间协同建模方面存在的不足, 提出引入跨模态注意力的图增强哈希网络模型. 首先设计图信息增强的标签编码器, 利用诊断类别标签的共现关系构建标签图, 并通过图卷积网络有效地提取患者的多标签嵌入表征; 为了缓解多标签矩阵的稀疏性问题并丰富融合表征的语义信息, 引入图像、文本与多标签嵌入表征之间的跨模态注意力融合机制; 最后将诊断类别的多标签共现关系与语义信息有效地融入患者图像、文本数据的哈希编码中, 进一步提升哈希编码的判别能力, 并设计了模态内与模态间关联互补的相似性损失函数. 在胸部X光成像与诊断报告数据集MIMIC-CXR上进行实验的结果表明, 与3种经典跨模态哈希方法和4种先进的深度跨模态哈希方法相比, 所提模型在多个编码位数下均展现出了性能优势. 在X光成像检索诊断报告任务上, 所提模型在哈希编码长度为16, 32, 64和128位时,平均精度均值相比次优方法分别提高了2.98%, 1.21%, 0.63%和0.53%; 在诊断报告检索X光成像任务上, 所提模型在哈希编码为上述4个长度时, 平均精度均值相比次优方法分别提高了0.91%, 0.75%, 1.03%和0.57%.

     

    Abstract: By retrieving patient data from multiple modalities such as images and text, physicians can analyze the patient’s disease progression more comprehensively and deeply, thereby making more precise clinical diagnoses and treatments. Given the rapid growth of medical data and the challenges of low storage requirements and efficient retrieval, hashing methods have gradually become a popular paradigm in the field of cross-modal patient retrieval. However, existing cross-modal hashing methods still have deficiencies in utilizing the multi-label information of patient diagnostic categories and achieving collaborative modeling between modalities. Therefore, we propose a method named graph enhanced hashing with cross-modal attention, which is used for cross-modal patient retrieval. Firstly, a graph information enhanced label encoder is designed, which constructs a label graph using the co-occurrence relationships of diagnostic category labels and effectively extracts multi-label embedding representations of patients through a graph convolutional network. Secondly, to alleviate the sparsity of the multi-label matrix and enrich the semantic information of the fused representations, the model introduces a cross-modal attention fusion mechanism between image, text, and multi-label embeddings. Finally, to effectively integrate the multi-label co-occurrence relationships and semantic information of diagnostic categories into the hash codes of patient image and text data, and further enhance the discriminative power of the hash codes, a similarity loss function that utilizes intra-modal and inter-modal associations is designed. Experiments on the MIMIC-CXR dataset of chest X-ray images and diagnostic reports demonstrate that, compared with three classical cross-modal hashing methods and four advanced deep cross-modal hashing methods, GEHCA exhibits performance advantages under various hash code lengths. On the task of retrieving diagnostic reports using X-ray images, the proposed model achieves mean average precision improvements of 2.98%, 1.21%, 0.63%, and 0.53% compared to the second-best method, at hash code lengths of 16, 32, 64, and 128 bits, respectively. On the task of retrieving X-ray images using diagnostic reports, the proposed model achieves mean average precision improvements of 0.91%, 0.75%, 1.03%, and 0.57% compared to the second-best method at the above 4 hash code lengths.

     

/

返回文章
返回