专题:自然语言处理与文本信息分析

基于语义角色标注的专利主题提取研究

  • 孟令恩 ,
  • 李颖 ,
  • 何彦青 ,
  • 屈鹏 ,
  • 王惠临
展开
  • 中国科学技术信息研究所
孟令恩,中国科学技术信息研究所硕士研究生;何彦青,中国科学技术信息研究所副研究员;屈鹏,中国科学技术信息研究所助理研究员;王惠临,中国科学技术信息研究所研究员,博士。

收稿日期: 2014-07-24

  修回日期: 2014-09-04

  网络出版日期: 2014-10-05

基金资助

本文系国家自然科学基金项目“面向专利文献的统计机器翻译语境分析”(项目编号:61303152)和中日国际合作项目“面向科技文献的日汉双向实用型机器翻译合作研究”(项目编号:2014DFA11350)研究成果之一。

Research on Patent Topics Extraction Based on Semantic Role Labeling

  • Meng Ling'en ,
  • Li Ying ,
  • He Yanqing ,
  • Qu Peng ,
  • Wang Huilin
Expand
  • Institute of scientific and Technical Information of China, Beijing 100038

Received date: 2014-07-24

  Revised date: 2014-09-04

  Online published: 2014-10-05

摘要

主题自动提取对于专利文献的信息挖掘具有重要的意义。引入语义角色标注信息来辅助自动提取专利文献主题,区别于已有的专利文本分析平台所采用的人工标注或模板方式。为了改善专利文献的语义角色标注,首先描述将专利文献长句自动拆分成简化句的方法;其次,对简化句进行语义角色标注;最后,综合利用简化句语义信息以及自建带语义框架的常用词表,对专利文献进行主题信息抽取,获得必要信息,从而证实本研究的实用价值。

本文引用格式

孟令恩 , 李颖 , 何彦青 , 屈鹏 , 王惠临 . 基于语义角色标注的专利主题提取研究[J]. 图书情报工作, 2014 , 58(19) : 19 -24 . DOI: 10.13266/j.issn.0252-3116.2014.19.003

Abstract

Automatic topics extraction is crucial to mine information of patent literatures. The existing patent text analysis platforms use either manual annotation or templates to find topics. This paper introduces semantic role labeling (SRL) information to help extract patent topics automatically. To improve application effect of SRL to patent literatures, it first introduces the method of automatical sentences implification, then labels semantic roles for the simplified sentences, finally synthesizes semantic information and frequently used words with semantic framework to extract patent topics. The experimental results show that it can s extract beneficial knowledge from patents, and prove the valve of this study.

参考文献

[1] Gildea D, Jurafsky D.Automatic labeling of semantic roles[J].Computational Linguistics.2002, 28(3):245-288.

[2] Narayanan S, Harabagiu S.Question answer based on semantic structures[C]//Proceedings of the 20th International Conference on Computational Linguistics.Geneva: Association for Computational Linguistics,2004.

[3] Kong Fang, Zhou Guodong,Zhu Qiaoming,et al. Employing the centering theory in pronoun resolution from the semantic perspective[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing. Suntec: Association for Computational Linguistics, 2009:987-996.

[4] Surdeanu M, Harabgiu S, Willams J, et al.Using predicate-argument structures for information extraction[C]//Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Sapporo: Association for Computational Linguistics, 2003.

[5] Wu Dekai, Fung P.Can semantic role labeling improve SMT[C]//Proceedings of the 13th Annual Conference of the European Association for Machine Translation. Barcelona:European Association for Machine Translation,2009:218-225.

[6] Baker C F, Fillmore C J, Lowe J B.The Berkeley FrameNet Project [C]// Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics. Montreal: Association for Computational Linguistics, 1998:86-90.

[7] Kipper K, Korhonen A, Ryant N, et al. A large-scale classification of English verbs[J]. Language Resources and Evaluation, 2008(42):21-40.

[8] Palmer M, Gildea D, Kingsbury P. The proposition bank: An annotated corpus of semantic roles[J]. Computational Linguistics, 2005, 31(1):71-106.

[9] Gildea D, Jurafsky D. Automatic labeling of semantic roles[J]. Computational Linguistics, 2002, 28(3):245-288.

[10] Pradhan S, Hacioglu K, Krugler V, et al. Support vector learning for semantic argument classification[J]. Machine Learning Journal, 2005,60(1/3):11-39.

[11] Carreras X, M'arquez L, Chrupaa G. Hierarchical recognition of propositional arguments with perceptrons[C]//Ng H T, Riloff R. HLTNAACL 2004 Workshop: Eighth Conference on Computational Natural Language Learning (CoNLL-2004). Boston:Association for Computational Linguistics, 2004.

[12] Koomen P, Punyakanok V, Roth D, et al. Generalized inference with multiple semantic role labeling systems[C]//Proceedings of CoNLL-2005. Ann Arbor:Association for Computational Linguistics, 2005.

[13] Schapire R E, Singer Y. Improved boosting algorithms using confidencerated predictions[J]. Machine Learning, 1999, 37(3):297-336.

[14] Breiman L. Random forests[J]. Machine Learning, 2001, 45(1):5-32.

[15] 汪雪锋,王有国,刘玉琴.多数据源协同下的专利分析系统构建[J].图书情报工作,2013,57(14):92-96.

[16] 姜彩红,乔晓东,朱礼军.基于本体的专利摘要知识抽取[J].现代图书情报技术,2009(2):23-28.

[17] 张兆锋, 桂婕, 李颖. 中文专利信息资源深加工方案设计与实证研究[J].数字图书馆论坛, 2014(7):45-51.

文章导航

/