知识组织

基于向量空间模型的古汉语词义自动消歧研究

  • 常娥 ,
  • 张长秀 ,
  • 侯汉清 ,
  • 惠富平
展开
  • 1. 东南大学图书馆;
    2. 南京农业大学信息科技学院;
    3. 南京农业大学人文学院
常娥,东南大学图书馆副研究馆员,E-mail:chang_e@seu.edu.cn;张长秀,东南大学图书馆副研究馆员;侯汉清,南京农业大学信息科技学院教授;惠富平,南京农业大学人文学院教授。

收稿日期: 2012-08-15

  修回日期: 2012-11-13

  网络出版日期: 2013-01-20

基金资助

本文系国家社会科学基金项目"古籍整理与开发智能化技术研究"(项目编号:08ATQ002)和高等学校博士学科点专项科研基金资助课题"古农书资料自动编纂及注释系统的设计与构建"(项目编号:20090097110033)研究成果之一。

Automatic Word Sense Disambiguation of Ancient Chinese Based on Vector Space Model

  • Chang E ,
  • Zhang Changxiu ,
  • Hou Hanqing ,
  • Hui Fuping
Expand
  • 1. Southeast University Library, Nanjing 210096;
    2. School of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095;
    3. School of Humanities and Social Sciences, Nanjing Agricultural University, Nanjing 210095

Received date: 2012-08-15

  Revised date: 2012-11-13

  Online published: 2013-01-20

摘要

借鉴现代汉语词义消歧的研究成果,提出一种改进的向量空间模型词义消歧方法,即在古汉语义项词语知识库的支持下,将待消歧多义词上下文与多义词的义项映射到向量空间模型中,完成语义消歧任务。以中国农业古籍全文数据库为统计语料,对10个典型古汉语多义词,共29个义项、1 836条待消歧上下文进行义项标注的实验,消歧平均正确率达到79.5%。

本文引用格式

常娥 , 张长秀 , 侯汉清 , 惠富平 . 基于向量空间模型的古汉语词义自动消歧研究[J]. 图书情报工作, 2013 , 57(02) : 114 -118 . DOI: 10.7536/j.issn.0252-3116.2013.02.022

Abstract

How to annotate the meaning of words is an important research work on collation of Chinese ancient books. The manual interpretation is time-consuming and laborious. According to the word sense disambiguation of modern Chinese, an improved unsupervised disambiguation method of ancient Chinese is proposed based on the vector space model. In order to disambiguate the word sense, the knowledge repository of ancient Chinese polysemous words is build, and the contexts and the meanings of the polysemous words are mapped into the vector space model. This paper takes the full-text database of Chinese agricultural ancient books for statistics corpus, and conducts the experiment using 10 typical polysemous words of ancient Chinese which include 29 senses and 1836 contexts. The result shows that the average disambiguation accuracy achieves 79.5%.

参考文献

[1] 百度百科.古书注解[EB/OL]. [2012-05-23].http://baike.baidu.com/view/793424.htm#3.



[2] 卢志茂,刘挺,李生,等.统计词义消歧的研究进展[J].电子学报,2006(2):333-343.



[3] Lesk M. Automatic Sense Disambiguation Using Machine Readable Dictionaries: how to tell a pine cone from an ice cream cone[C]// Proceedings of the 5th International Conference on Systems Documentation. Toronto Canada: ACM, 1986: 24-26.



[4] Manning C D, Schutze H. Foundations of statistical natural language processing [M].Cambridge: The MIT Press, 1999: 229-260.



[5] Yarowsky D. Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora[EB/OL].[2012-05-23].http://www.informatik.uni-trier.de/~ley/db/conf/coling/coling1992.html.



[6] Ng H T and Lee H B. Integrating multiple knowledge sources to disambiguate word sense: An example based approach[EB/OL].[2012-05-23].http://citeseerx.ist.psu.edu/showciting?cid=4549.



[7] 张仰森,郭江. 四种统计词义消歧模型的分析与比较[J]. 北京信息科技大学学报,2011,26(2):13-18.



[8] 李永亮,黄曙光,鲍蕾,等. 一种基于PageRank算法和知网的词义消歧方法[J]. 计算机应用与软件,2011(5):213-215.



[9] Lin Shoude, Karin V. A semantics-Enhanced language model for unsupervised word sense disambiguation[C]//Proceedings of the 9th International Conference on Computational Linguistics and Intelligent Text Proceeding. Haifa: Springer,2008: 287-298.



[10] 李娟子. 汉语词义消歧方法研究[D]. 北京:清华大学,1999.



[11] 李旭,刘国华,张东明. 一种改进的汉语全文无指导词义消歧方法[J]. 自动化学报,2010(1):184-187.



[12] 鲁松,白硕,黄雄,等. 基于向量空间模型中义项词语的无导词义消歧[J]. 软件学报,2002(6):1082-1089.



[13] 陈浩,何婷婷,姬东鸿,等. 基于K-means聚类的无导词义消歧[J]. 中文信息学报,2005,19(4):10-16.



[14] Kilgarriff A. I don’t believe in word sense [J]. Computers and the Humanities, 1997,31(2): 91-113.



[15] Veronis J. Sense tagging: Does it make sense? [EB/OL].[2012-05-03].http://sites.univ-provence.fr/veronis/pdf/2001-lancaster-sense.pdf.



[16] Agirre E, Edmonds P G. Word sense disambiguation algorithms, applications and trends [M].Amsterdam: Kluwer, 2006.

文章导航

/