Library and Information Service >
Automatic Word Sense Disambiguation of Ancient Chinese Based on Vector Space Model
Received date: 2012-08-15
Revised date: 2012-11-13
Online published: 2013-01-20
How to annotate the meaning of words is an important research work on collation of Chinese ancient books. The manual interpretation is time-consuming and laborious. According to the word sense disambiguation of modern Chinese, an improved unsupervised disambiguation method of ancient Chinese is proposed based on the vector space model. In order to disambiguate the word sense, the knowledge repository of ancient Chinese polysemous words is build, and the contexts and the meanings of the polysemous words are mapped into the vector space model. This paper takes the full-text database of Chinese agricultural ancient books for statistics corpus, and conducts the experiment using 10 typical polysemous words of ancient Chinese which include 29 senses and 1836 contexts. The result shows that the average disambiguation accuracy achieves 79.5%.
Key words: vector space model; semantic disambiguation; ancient Chinese
Chang E , Zhang Changxiu , Hou Hanqing , Hui Fuping . Automatic Word Sense Disambiguation of Ancient Chinese Based on Vector Space Model[J]. Library and Information Service, 2013 , 57(02) : 114 -118 . DOI: 10.7536/j.issn.0252-3116.2013.02.022
[1] 百度百科.古书注解[EB/OL]. [2012-05-23].http://baike.baidu.com/view/793424.htm#3.
[2] 卢志茂,刘挺,李生,等.统计词义消歧的研究进展[J].电子学报,2006(2):333-343.
[3] Lesk M. Automatic Sense Disambiguation Using Machine Readable Dictionaries: how to tell a pine cone from an ice cream cone[C]// Proceedings of the 5th International Conference on Systems Documentation. Toronto Canada: ACM, 1986: 24-26.
[4] Manning C D, Schutze H. Foundations of statistical natural language processing [M].Cambridge: The MIT Press, 1999: 229-260.
[5] Yarowsky D. Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora[EB/OL].[2012-05-23].http://www.informatik.uni-trier.de/~ley/db/conf/coling/coling1992.html.
[6] Ng H T and Lee H B. Integrating multiple knowledge sources to disambiguate word sense: An example based approach[EB/OL].[2012-05-23].http://citeseerx.ist.psu.edu/showciting?cid=4549.
[7] 张仰森,郭江. 四种统计词义消歧模型的分析与比较[J]. 北京信息科技大学学报,2011,26(2):13-18.
[8] 李永亮,黄曙光,鲍蕾,等. 一种基于PageRank算法和知网的词义消歧方法[J]. 计算机应用与软件,2011(5):213-215.
[9] Lin Shoude, Karin V. A semantics-Enhanced language model for unsupervised word sense disambiguation[C]//Proceedings of the 9th International Conference on Computational Linguistics and Intelligent Text Proceeding. Haifa: Springer,2008: 287-298.
[10] 李娟子. 汉语词义消歧方法研究[D]. 北京:清华大学,1999.
[11] 李旭,刘国华,张东明. 一种改进的汉语全文无指导词义消歧方法[J]. 自动化学报,2010(1):184-187.
[12] 鲁松,白硕,黄雄,等. 基于向量空间模型中义项词语的无导词义消歧[J]. 软件学报,2002(6):1082-1089.
[13] 陈浩,何婷婷,姬东鸿,等. 基于K-means聚类的无导词义消歧[J]. 中文信息学报,2005,19(4):10-16.
[14] Kilgarriff A. I don’t believe in word sense [J]. Computers and the Humanities, 1997,31(2): 91-113.
[15] Veronis J. Sense tagging: Does it make sense? [EB/OL].[2012-05-03].http://sites.univ-provence.fr/veronis/pdf/2001-lancaster-sense.pdf.
[16] Agirre E, Edmonds P G. Word sense disambiguation algorithms, applications and trends [M].Amsterdam: Kluwer, 2006.
/
〈 | 〉 |