[1]封红旗,孙杨,吴涛,等.融合多特征和迭代扩张卷积的中文电子病历命名实体识别[J].常州大学学报(自然科学版),2023,35(01):59-67.[doi:10.3969/j.issn.2095-0411.2023.01.008]
 FENG Hongqi,SUN Yang,WU Tao,et al.Chinese electronic medical record named entity recognition based on multi-features and IDCNN[J].Journal of Changzhou University(Natural Science Edition),2023,35(01):59-67.[doi:10.3969/j.issn.2095-0411.2023.01.008]
点击复制

融合多特征和迭代扩张卷积的中文电子病历命名实体识别()
分享到:

常州大学学报(自然科学版)[ISSN:2095-0411/CN:32-1822/N]

卷:
第35卷
期数:
2023年01期
页码:
59-67
栏目:
计算机与信息工程
出版日期:
2023-01-28

文章信息/Info

Title:
Chinese electronic medical record named entity recognition based on multi-features and IDCNN
文章编号:
2095-0411(2023)01-0059-09
作者:
封红旗1孙杨1吴涛1王少聪1李文杰23
(1.常州大学计算机与人工智能学院,江苏常州213164;2.常州大学微电子与控制工程学院,江苏常州213164;3.常州市生物医学信息技术重点实验室,江苏常州213164)
Author(s):
FENG Hongqi1 SUN Yang1 WU Tao1 WANG Shaocong1 LI Wenjie23
(1.School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou 213164, China; 2.School of Microelectronics and Control Engineering, Changzhou University, Changzhou 213164, China; 3.Changzhou Key Laboratory of Biomedical Information Technology, Changzhou 213164, China)
关键词:
中文电子病历 命名实体识别 卷积神经网络 自注意力机制
Keywords:
chinese electronic medical record named entity recognition convolutional neural network self-attention mechanism
分类号:
TP 391
DOI:
10.3969/j.issn.2095-0411.2023.01.008
文献标志码:
A
摘要:
针对中文电子病历命名实体识别过程中文本语义表示不充分、特征抽取效率低等缺陷,提出一种融合多特征和迭代扩张卷积的命名实体识别方法。该方法首先构建基于卷积神经网络(CNN)的字嵌入算法,将生成的字向量与词向量等外部特征信息融合后送入迭代扩张卷积神经网络(IDCNN)中进行特征抽取,引入注意力机制加强序列间依赖关系,最后通过CRF解码最优标签序列。该方法在CCKS2017中文电子病历数据集中取得了91.36%的F1值,识别性能优于现有方法,同时验证了融合多特征的语义表示对中文实体识别有一定性能提升。
Abstract:
Aiming at the problems of word segmentation errors, fuzzy word boundaries and low model calculation efficiency in the process of entity recognition task, a Chinese electronic medical record named entity recognition method that combines multiple features and IDCNN is proposed. This method first constructs a CNN-based character embedding algorithm to train the char vector, then splices it with the word vector and other additional features, then sends it to the iterative expanded convolutional neural network for feature learning, and finally decodes the optimal label sequence through CRF. Experimental results show that the F1 value of this method in the CCKS2017 Chinese electronic medical record dataset reaches 91.36%, and the training efficiency is better than the existing model, which verifies the effectiveness of the method.

参考文献/References:

[1] 吴宗友, 白昆龙, 杨林蕊, 等. 电子病历文本挖掘研究综述[J]. 计算机研究与发展, 2021, 58(3): 513-527.
[2] ZHOU J N, WANG J K, LIU G S. Multiple character embeddings for Chinese word segmentation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Florence: Association for Computational Linguistics, 2019: 210-216.
[3] DONG C H, ZHANG J J, ZONG C Q, et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[C]//Natural Language Understanding and Intelligent Applications. Springer: Cham, 2016: 1-12.
[4] MA R T, PENG M L, ZHANG Q, et al. Simplify the usage of lexicon in Chinese NER[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Cambridge: MIT Press, 2020: 5951-5960.
[5] HUANG Z, WEI X, KAI Y. Bidirectional LSTM-CRF models for sequence tagging[J]. Computer Science, 2015: 35-45.
[6] 宦娟, 李慧, 李明宝, 等. 基于交叉验证网格寻优的GBDT-LSTM水产养殖溶解氧预测[J].常州大学学报(自然科学版), 2021, 33(4): 63-71.
[7] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[8] JOHN L, ANDREW M, PEREIRA FERNANDO C N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data[J]. Proceedings of the Eighteenth International Conference on Machine Learning, 2001: 282-289.
[9] XU K, ZHOU Z F, HAO T Y, et al. A bidirectional LSTM and conditional random fields approach to medical named entity recognition[M]//Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017. Cham: Springer International Publishing, 2017: 355-365.
[10] WANG Q, ZHOU Y M, RUAN T, et al. Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition[J]. Journal of Biomedical Informatics, 2019, 92: 103133.
[11] 林景栋, 吴欣怡, 柴毅, 等. 卷积神经网络结构优化综述[J]. 自动化学报, 2020, 46(1): 24-37.
[12] STRUBELL E, VERGA P, BELANGER D, et al. Fast andaccurate entity recognition with iterated dilated convolutions[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2017: 2670-2680.
[13] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]// Proceedings of NIPS. Cambridge: MIT Press, 2013: 3111-3119.
[14] ZHANG X, ZHAO J B, LECUN Y. Character-level convolutional networks for text classification[J]. Advances in Neural Informational Processing Systems, 2015, 649-657.
[15] GUI T, MA R, ZHANG Q, et al. CNN-Based Chinese NER with lexicon rethinking[C]// Shanghai: IJCAI, 2019: 4982-4988.
[16] 王毅, 戴国洪, 王克胜. 深度学习技术在预测维修中的应用综述[J]. 常州大学学报(自然科学版), 2019, 31(3): 1-22.
[17] BAHDANAU D, CHO K, BENGIO Y. Neuralmachine translation by jointly learning to align and translate[J]. Computer Science, 2014: 18-23.
[18] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: Acm, 2017: 5998-6008.

备注/Memo

备注/Memo:
收稿日期: 2022-08-09。
基金项目: 江苏省科技厅社会发展资助项目(BE2018638); 常州市社会发展资助项目(CE20195025)。
作者简介: 封红旗(1966—), 男, 江苏常州人, 研究员。E-mail: hqfeng@cczu.edu.cn
更新日期/Last Update: 1900-01-01