«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.2095-0411.2023.01.008]
点击复制

融合多特征和迭代扩张卷积的中文电子病历命名实体识别()

分享到：

常州大学学报(自然科学版)[ISSN:2095-0411/CN:32-1822/N]

卷:: 第35卷
期数:: 2023年01期

页码:: 59-67

栏目:: 计算机与信息工程

出版日期:: 2023-01-28

文章信息/Info

Title:: Chinese electronic medical record named entity recognition based on multi-features and IDCNN

文章编号:: 2095-0411(2023)01-0059-09

作者:: 封红旗¹; 孙杨¹; 吴涛¹; 王少聪¹; 李文杰²; 3; (1.常州大学计算机与人工智能学院,江苏常州213164;2.常州大学微电子与控制工程学院,江苏常州213164;3.常州市生物医学信息技术重点实验室,江苏常州213164)

Author(s):: FENG Hongqi¹; SUN Yang¹; WU Tao¹; WANG Shaocong¹; LI Wenjie²; 3; (1.School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, China; 2.School of Microelectronics and Control Engineering, Changzhou University, Changzhou 213164, China; 3.Changzhou Key Laboratory of Biomedical Information Technology, Changzhou 213164, China)

关键词:: 中文电子病历; 命名实体识别; 卷积神经网络; 自注意力机制

Keywords:: chinese electronic medical record; named entity recognition; convolutional neural network; self-attention mechanism

分类号:: TP 391

DOI:: 10.3969/j.issn.2095-0411.2023.01.008

文献标志码:: A

摘要:: 针对中文电子病历命名实体识别过程中文本语义表示不充分、特征抽取效率低等缺陷,提出一种融合多特征和迭代扩张卷积的命名实体识别方法。该方法首先构建基于卷积神经网络(CNN)的字嵌入算法,将生成的字向量与词向量等外部特征信息融合后送入迭代扩张卷积神经网络(IDCNN)中进行特征抽取,引入注意力机制加强序列间依赖关系,最后通过CRF解码最优标签序列。该方法在CCKS2017中文电子病历数据集中取得了91.36%的F₁值,识别性能优于现有方法,同时验证了融合多特征的语义表示对中文实体识别有一定性能提升。

Abstract:: Aiming at the problems of word segmentation errors, fuzzy word boundaries and low model calculation efficiency in the process of entity recognition task, a Chinese electronic medical record named entity recognition method that combines multiple features and IDCNN is proposed. This method first constructs a CNN-based character embedding algorithm to train the char vector, then splices it with the word vector and other additional features, then sends it to the iterative expanded convolutional neural network for feature learning, and finally decodes the optimal label sequence through CRF. Experimental results show that the F₁ value of this method in the CCKS2017 Chinese electronic medical record dataset reaches 91.36%, and the training efficiency is better than the existing model, which verifies the effectiveness of the method.

参考文献/References:

[1] 吴宗友, 白昆龙, 杨林蕊, 等. 电子病历文本挖掘研究综述[J]. 计算机研究与发展, 2021, 58(3): 513-527.
[2] ZHOU J N, WANG J K, LIU G S. Multiple character embeddings for Chinese word segmentation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Florence: Association for Computational Linguistics, 2019: 210-216.
[3] DONG C H, ZHANG J J, ZONG C Q, et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[C]//Natural Language Understanding and Intelligent Applications. Springer: Cham, 2016: 1-12.
[4] MA R T, PENG M L, ZHANG Q, et al. Simplify the usage of lexicon in Chinese NER[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Cambridge: MIT Press, 2020: 5951-5960.
[5] HUANG Z, WEI X, KAI Y. Bidirectional LSTM-CRF models for sequence tagging[J]. Computer Science, 2015: 35-45.
[6] 宦娟, 李慧, 李明宝, 等. 基于交叉验证网格寻优的GBDT-LSTM水产养殖溶解氧预测[J].常州大学学报(自然科学版), 2021, 33(4): 63-71.
[7] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[8] JOHN L, ANDREW M, PEREIRA FERNANDO C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[J]. Proceedings of the Eighteenth International Conference on Machine Learning, 2001: 282-289.
[9] XU K, ZHOU Z F, HAO T Y, et al. A bidirectional LSTM and conditional random fields approach to medical named entity recognition[M]//Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017. Cham: Springer International Publishing, 2017: 355-365.
[10] WANG Q, ZHOU Y M, RUAN T, et al. Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition[J]. Journal of Biomedical Informatics, 2019, 92: 103133.
[11] 林景栋, 吴欣怡, 柴毅, 等. 卷积神经网络结构优化综述[J]. 自动化学报, 2020, 46(1): 24-37.
[12] STRUBELL E, VERGA P, BELANGER D, et al. Fast andaccurate entity recognition with iterated dilated convolutions[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 2670-2680.
[13] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]// Proceedings of NIPS. Cambridge: MIT Press, 2013: 3111-3119.
[14] ZHANG X, ZHAO J B, LECUN Y. Character-level convolutional networks for text classification[J]. Advances in Neural Informational Processing Systems, 2015, 649-657.
[15] GUI T, MA R, ZHANG Q, et al. CNN-Based Chinese NER with lexicon rethinking[C]// Shanghai: IJCAI, 2019: 4982-4988.
[16] 王毅, 戴国洪, 王克胜. 深度学习技术在预测维修中的应用综述[J]. 常州大学学报(自然科学版), 2019, 31(3): 1-22.
[17] BAHDANAU D, CHO K, BENGIO Y. Neuralmachine translation by jointly learning to align and translate[J]. Computer Science, 2014: 18-23.
[18] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: Acm, 2017: 5998-6008.

备注/Memo

备注/Memo:: 收稿日期: 2022-08-09。
基金项目: 江苏省科技厅社会发展资助项目(BE2018638); 常州市社会发展资助项目(CE20195025)。
作者简介: 封红旗(1966—), 男, 江苏常州人, 研究员。E-mail: hqfeng@cczu.edu.cn

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed442
全文下载/Downloads368
评论/Comments

更新日期/Last Update: 1900-01-01