大语言模型可解释性研究综述

丁一; 慎妍智; 李国俊; 王永恒; 陈为; 周志光

doi:10.3724/SP.J.1089.2025-00359

大语言模型可解释性研究综述

A Survey on the Explainability of Large Language Model

摘要

摘要: 大语言模型凭借出色的任务解决能力而声名鹊起. 从基础语言建模和文本生成任务到复杂推理任务, 大模型实现了从通用能力到专业能力的转变, 并在与用户的交互中逐步落地于各应用场景. 尽管大模型迄今已带来了空前深远的影响, 其仍因为内部机制透明度低、伦理道德水平待考量等可解释问题而饱受诟病, 需要更多针对大模型的可解释性研究来揭开其 “面纱”. 由于大模型的输出受交互过程影响和应用领域规范, 本文创新性地提出了从模型、用户两个更加立体的角度, 将大模型的可解释性拆分成模型决策过程透明度、用户交互可控性和模型输出结果可信度来对现有工作进行全面综述, 具体来说, 本文首先从大模型本身出发, 分别于模型内、外部紧扣训练微调技术和增强技术, 分类介绍现有的解释方法; 接着从人机交互层面, 介绍从用户端输入提示引导模型决策以增强大模型可解释性的研究; 最后介绍了大模型可解释性研究面临的局限性, 并对其未来发展做出展望.

Abstract: Large language models have gained prominence due to their outstanding task-solving capabilities. The evolution from basic language modeling and text generation tasks to complex reasoning tasks has facilitated the transition of large language models from general to specialized capabilities. This gradual implementation across various application scenarios underscores their utility in interaction with users. Despite the unprecedented and profound impact of these models, they are often criticized for their lack of transparency in internal mechanisms and ethical considerations. There remains a necessity for more explainability researches to unveil the mystery of large language models, thereby enhancing their abilities to adapt to downstream tasks and improving user experience. The outputs of large language models are influenced by both the interactive process and the strict norms of domain applications. This paper innovatively proposes a comprehensive review of explainability studies of large language models from two dimensions: model and user. Specifically, we combine model process-transparency and interaction-controllability with the credibility of model outputs. First, we explore the model itself, categorizing existing interpretative methods based on internal and external techniques for training fine-tuning and enhancement. Next, from the perspective of human-computer interaction, we discuss research on guiding model decisions through user-input prompts to enhance model explainability. Finally, we outline the limitations faced by explainability studies of large language models and offer prospects for future development.

HTML全文

参考文献(0)

施引文献

资源附件(0)

英文长摘要