[目的/意义]针对医学文本的特点,提出一种基于语义图的多文档自动摘要方法,并利用其中的语义信息实现摘要主题的识别。[方法/过程]利用SemRep实现源文档概念及其语义关系的规范化抽取并构建语义图,从概念-关系-社区3个层次对网络图中的关键信息进行抽取并生成摘要,利用概念-语义类型-类型分组三级映射实现对概念的归类,结合语义搭配模式对摘要主题进行划分。[结果/结论]通过对5种疾病数据集进行测试,结果显示该方法能有效识别出文献集中的核心内容,语义图中所富含的语义信息能准确地对摘要进行主题划分。
[Purpose/significance] Addressing the special features of medical text, this paper proposed a method for multidocument automatic summarization based on semantic graph. By taking advantage of the semantics in the graph, it identified the themes in the summary.[Method/process] SemRep was used to extract the standard concepts and semantic relations from medical documents, which then were used to construct the semantic graph. Subsequently, core concept, semantic relations and communities were sequentially extracted from the graph to compose the summary, and the mappings between concepts and semantic types as well as between semantic types and semantic groups were used to group concepts macroscopically. Schemas were defined to identify the themes in the summary.[Result/conclusion] Five datasets on diseases were used for testing and the results showed the method could effectively extract the core content from the documents. The semantics enriched in the graph could be used to precisely recognize the themes for the summary.
[1] CIOS K J, MOORE G W. Uniqueness of medical data mining[J]. Artificial intelligence in medicine, 2002, 26(1):1-24.
[2] GAYATHRI M, MYTHILI K, KANNAN R J. Mining bio medical literature using ontology based text mining[J]. International journal of computer applications in engineering sciences, 2014, 4(3):43-47.
[3] DEMNER-FUSHMAN D, LIN J. Answer extraction, semantic clustering, and answer extraction, semantic clustering, and extractive summarization for clinical question answering[C]//Proceedings of 21st international conference on computational linguistics and 44th annual meeting of the ACL. Sydney:ACL, 2006:841-848.
[4] PLAZA L. Comparing different knowledge sources for the automatic summarization of biomedical literature[J]. Journal of biomedical informatics, 2014, 52:319-328.
[5] ARONSON A R, LANG F. An overview of MetaMap:historical perspective and recent advances[J]. Journal of the American Medical Informatics Association, 2010, 17(3):229-236.
[6] RINDFLESCH T C, FISZMAN M, LIBBUS B. Semantic interpretation for the biomedical research literature[M]//CHEN H, FULLER S S, FRIEDMAN C, et al. Medical informatics:knowledge management and data mining in biomedicine. New York:Springer, 2005:399-422.
[7] SAVOVA G K, MASANZ J J, OGREN P V, et al. Mayo clinical text analysis and knowledge extraction system (cTAKES):architecture, component evaluation and applications[J]. Journal of Amercian Medical Informatics Assocication, 2010, 17(5):507-513.
[8] PLAZA L, DIAZ A, GERVAS P. A semantic graph-based approach to biomedical summarisation[J]. Artificial intelligence in medicine, 2011, 53(1):1-14.
[9] 商玥,林鸿飞,杨志豪. 利用语义关系抽取生成生物医学文摘的算法[J]. 计算机科学与探索, 2011, 5(11):1027-1035.
[10] MISHRA R, BIAN J, FISZMAN M, et al. Text summarization in the biomedical domain:a systematic review of recent research[J]. Journal of biomedical informatics, 2014, 52:457-467.
[11] SARKAR K. Using domain knowledge for text summarization in medical domain[J]. International journal of recent trends in engineering, 2009, 1(1):200-205.
[12] REEVE L H, HAN H, BROOKS A D. BioChain:lexical chaining methods for biomedical text summarization[C]//Proceedings of the 2006 ACM symposium on applied computing. Dijon:ACM, 2006:180-184.
[13] 曹洋,成颖,裴雷. 基于机器学习的自动文摘研究综述[J]. 图书情报工作, 2014,58(18):122-130.
[14] FISZMAN M, RINDFLESCH T C, KILICOGLU H. Abstraction summarization for managing the biomedical research literature[C]//Proceedings of the HLT-NAACL workshop on computational lexical semantics. Boston:Association for Computational Linguistics, 2004:76-83.
[15] FISZMAN M, RINDFLESCH T C, KILICOGLU H. Summarizing drug information in medline citations[C]//AMIA annual symposium proceedings. Washington DC:American Medical Informatics Association, 2006:254-258.
[16] FISZMAN M, DEMNER-FUSHMAN D, KILICOGLU H, et al. Automatic summarization of MEDLINE citations for evidence-based medical treatment:a topic-oriented evaluation[J]. Journal of biomedical informatics, 2009, 42(5):801-813.
[17] ZHANG H, FISZMAN M, SHIN D, et al. Degree centrality for semantic abstraction summarization of therapeutic studies[J]. Journal of biomedical informatics, 2011, 44(5):830-838.
[18] 张晗,赵玉虹. 医学文献语义共词知识网的构建:方法与实证[J]. 图书情报工作, 2016, 60(11):135-142.
[19] SIMPSON M S, DEMNER-FUSHMAN D. Mining text data[M]. New York:Springer, 2012, 465-517.
[20] LIU H, HUNTER L, KESELJ V, et al. Approximate subgraph matching-based literature mining for biomedical events and relations[J]. PLoS One, 2013, 8(4):e60954.
[21] MATSUNAGA T, YONEMORI C, TOMITA E, et al. Clique-based data mining for related genes in a biomedical database[J]. BMC bioinformatics, 2009, 10:205.
[22] FISZMAN M, RINDFLESCH T C, KILICOGLU H. Summarization of an online medical encyclopedia[J]. Studies in health technology and informatics, 2004; 107(Pt1):506-510.
[23] 张晗,刘双梅. 节点中心度指标对语义述谓网络概念抽取的比较分析——以疾病治疗学研究为例[J]. 现代图书情报技术, 2013(6):30-35.
[24] 刘建国,任卓明,郭强,等. 复杂网络中节点重要性排序的研究进展[J]. 物理学报, 2013(17):1-10.
[25] 白如江,冷伏海. k-clique社区知识创新演化方法研究[J]. 图书情报工作, 2013, 57(17):86-94.
[26] 张晗,赵玉虹. 基于Clique聚类的精神分裂症多文档自动摘要研究[J]. 中华医学图书情报杂志, 2016, 25(3):18-24.
[27] UMLS terminology services[EB/OL].[2016-12-15].https://uts.nlm.nih.gov/home.html.