高级检索

面向机加工艺规程文本的实体识别模型

Named Entity Recognition Method for Process Planning Text

  • 摘要: 为实现非结构化工艺规程文本中关键信息的高效识别, 建立一种基于机加工领域词典和神经网络的命名实体识别模型. 首先, 结合机加工领域词典与jieba分词技术进行数据集的自动标注, 并在对工艺参数信息进行标注的过程中将数字和标志字母划分为一个分词单位来增强后续特征提取效果. 其次, 在word2vec词嵌入的基础上采用双向长短时记忆网络对文本进行特征提取. 最后, 采用条件随机场综合上下文逻辑以提高关键工艺信息的识别准确率. 在包含431条工步内容的数据集上对所提模型的识别效果进行实验, 结果表明, 所提模型的准确率、召回率和F1值分别为90.20%, 93.88%和92.00%, 在与领域内传统模型的对比上存在一定优势, 并使用3个不同工艺规程数据集验证了模型的鲁棒性.

     

    Abstract: To realize the efficient recognition of critical information in unstructured process planning text, a named entity recognition model based on technology dictionary and neural network is established. Firstly, the technology dictionary and jieba word segmentation technology are comprehensively combined to realize automatic annotation of datasets, especially, the number and its identification letters are recognized as one unit in the automatic annotation of process parameter data, which enhances the effect of subsequent feature extraction. Secondly, the bidirectional long short term memory network is used to extract the feature of text information based on word2vec. Finally, conditional random field model is used to synthesize contextual logic to improve the recognition accuracy of critical process information. To verify the effectiveness of the proposed model, 431 work steps are utilized as training sample. Experimental results show that the values of accuracy rate, recall and F1 are 90.20%, 93.88% and 92.00% respectively, which has certain advantages compared with traditional models in the field. In addition, three experimental datasets from different technology books are tested, the results also show high robustness of the proposed model.

     

/

返回文章
返回