Library and Information Service >
Study on the Rules of Zero Anaphora Resolution in Chinese Patent Literature
Received date: 2015-03-13
Revised date: 2015-04-20
Online published: 2015-05-05
[Purpose/significance] There is a huge number of patent documents, how users quickly and accurately grasp the knowledge is the key to optimize the patent service. The zero anaphora in Chinese patent literatures makes the automatic knowledge identification and extraction extremely difficult. Zero anaphora identification and resolution involves many technologies and particular resources, and there are still many problems unsolved. [Method/process] Under the guidance of Qualia Structure theory, semantic roles and Rhetorical Structure theory, this paper finds some rules for zero anaphora resolution. It develops a syntax and semantic roles labeling tool and a text annotation tool. And it constructs 4 kinds of libraries: ①"The library of qualia structure of patent verbs", in which the patent verbs are classified into 4 categories; ②"the library of the knowledge of argument structures", which is used to label patent verbs and argument structures; ③"the library of the rules of patent verbs argument structure ", which is used to analyze the antecedent of the zero anaphora; ④"The library of the rhetorical structure of zero anaphora", which is used to analyze the situation when "telic role" and "constitutive role" appears. Through the construction of the libraries, 5 rules for zero anaphora resolution are constructed. [Result/conclusion] Initial results have been successfully applied to automatically processing patent literatures in the field of automatic processing work.
Chin Wei , Qiao Xiaodong , Liu Yao , Qi Xiaoya . Study on the Rules of Zero Anaphora Resolution in Chinese Patent Literature[J]. Library and Information Service, 2015 , 59(9) : 73 -79,142 . DOI: 10.13266/j.issn.0252-3116.2015.09.011
[1] 关于"中国专利数量跃居世界首位"的冷思考[EB/OL].[2014-12-10]. http://www.sipo.gov.cn/wqyz/dsj/201304/t20130402_789909.html.
[2] 陈平.汉语零形回指的话语分析[J].中国语文, 1987(5):363-378.
[3] Li Wendan. Topic chains in chinese-A discourse analysis and applications in language teaching[M]. Muenchen: Lincom Europa, 2005.
[4] 曹逢甫.汉语的句子与子句结构[M].北京:北京语言大学出版社, 2008.
[5] 侯敏,孙建军.汉语中的零形回指及其在汉英机器翻译中的处理对策[J].中文信息学报, 2005,19(1):14-20.
[6] 王厚峰,何婷婷.汉语中人称代词的消解研究[J].计算机学报, 2001,24(2):138-143.
[7] 许敏,王能忠,马彦华.汉语中指代问题的研究及讨论[J]. 西南师范大学学报(自然科学版), 1999,24(6): 633-637.
[8] 章雷雷,王宁,李茹,等.FrameNet中有定的零形式识别[J].中文信息学报,2013,27(3): 107-112.
[9] Brennan S, Friedman M, Pollard C. A centering approach to pronouns[C]//Proceedings of the ACL(Association for Computational Linguistics)-87. Stanford:Stanford University, 1987:155-162.
[10] 王德亮. 汉语长距离回指的消解策略[C]//萧国政,何炎祥,孙茂松.中文计算技术与语言问题研究.北京:电子工业出版社,2007:241-247.
[11] 张伟男,张宇,刘挺.基于中心理论的中文对话省略恢复研究[C]//第六届全国信息检索学术会议论文集.牡丹江:中国中文信息学会, 2010:307-315.
[12] Pustejovsky J. The generative lexicon[M]. Cambridge: MIT Press,1995.
[13] Mann W C, Thompson S A. Rhetorical structure theory: Toward a functional theory of text organization[J]. Text,1988,8(3):243-281.
[14] 乐明.汉语财经评论的修辞结构标注及篇章研究[D].北京:中国传媒大学, 2006.
[15] 屈承熹. 汉语篇章语法[M]. 北京:北京语言大学出版社, 2006.
[16] 邵艳秋, 邱立坤, 梁春霞, 等. 中文语义依存树库构建及自动分析技术[M]//中国计算语言学研究前沿进展(2009-2011).北京:清华大学出版社, 2011:228-233.
[17] 李素建, 王荀, 王宇昕. 内容标签和关系标签相结合的汉语篇章标注[OL].[2013-11-13].http://nlp.zzu.edu.cn/CLSW2013/index.html.
[18] 王宇昕, 李素建. 篇章标注在医学领域问答系统中的应用[OL].[2013-12-11].http://210.29.169.226/CNCCL2013/main.html.
[19] 宋作艳. 逻辑转喻、事件强迫与名词动用[J]. 语言科学,2013, 12(2): 117-129.
[20] 赵蕴华,桂婕,张运良,等. 基于深度标引的专利文本挖掘框架研究[J]. 数字图书馆论坛,2008,54(11): 1-5.
[21] 姜彩红,乔晓东,朱礼军,等. 基于GATE的中文专利摘要的抽取[J]. 数字图书馆论坛,2008,54(11): 27-32.
[22] 王朝霞,邱清盈,冯培恩,等. 机械产品专利技术方案信息抽取方法[J]. 机械工程学报,2009,45(10): 198-206.
[23] Dong Zhendong, Qiang Dong. HowNet and the computation of meaning[M]. Singapore:World Scientic Publishing Company, 2006.
[24] 袁毓林. 基于生成词库论和论元结构理论的语义知识体系研究[J]. 中文信息学报,2013,27(6): 23-31.
/
〈 | 〉 |