知识组织

一种基于有向图的同义词抽取结果排序方法

  • 刘伟
展开
  • 中国科学技术信息研究所 北京 100038
刘伟(ORCID:0000-0003-2857-5474),副研究员,E-mail:liuw@istic.ac.cn。

收稿日期: 2015-05-14

  修回日期: 2015-05-25

  网络出版日期: 2015-06-20

基金资助

本文系国家自然科学基金项目“基于海量数字资源的科研关系网络构建研究”(项目编号:71273251)研究成果之一。

An Approach of Ranking Synonym Extracting Results Based on Directed Graph

  • Liu Wei
Expand
  • Institute of Scientific & Technical Information of China, Beijing 100038

Received date: 2015-05-14

  Revised date: 2015-05-25

  Online published: 2015-06-20

摘要

[目的/意义] 鉴于目前同义词抽取方法无法避免抽取结果含有较多的噪音,需要较高的人工代价去除噪音,提出一种对同义词抽取结果排序的方法,使得正确结果排序提前,以达到提高抽取结果准确性及降低人工去噪代价的目的。[方法/过程] 将抽取结果转化为抽取关系有向图,基于该有向图计算抽取结果中每个词汇与被抽取词汇的词义相似性,并按照词义相似性高低进行排序。排序方法的最大特点是只利用了当前的同义词抽取方法,不需要人工参与和额外的语义知识。[结果/结论] 通过在真实数据集上进行验证,得出排序效果与抽取结果的规模呈正向关系的论点,即一个给定词汇的同义词抽取结果数量越多,排序的效果就会越好。

本文引用格式

刘伟 . 一种基于有向图的同义词抽取结果排序方法[J]. 图书情报工作, 2015 , 59(12) : 128 -134 . DOI: 10.13266/j.issn.0252-3116.2015.12.019

Abstract

[Purpose/significance] There is lots of noise in the results extracted by current synonym extraction methods. It needs high artificial cost to remove noise. An approach is proposed to rank synonym extracting results which can make synonyms ahead of noises, and enhance the extraction accuracy and reduce the manual cost. [Method/process] It transforms the extracting results into a directed graph of extracting relation. The semantic similarity between each unit in the result and the word (or phrase) is calculated based on the directed graph, and the units in the result are ranked by semantic similarity. The approach just uses the existing synonym extracting method with no any human involvement and other semantic knowledge. [Result/conclusion] The experiments conducted on the real dataset show that the ranking effectiveness is getting better as the size of the extracting result increases.

参考文献

[1] 同义词抽取结果测评.[EB/OL].[2015-04-22].http://tcci.ccf.org.cn/conference/2012/dldoc/2012语义关系评测结果.pdf.
[2] Pantel P, Lin Dekang. Discovering word senses from text[C]//Proceedings of SIGKDD Conference on Knowledge Discovery and Data Mining. Edmonton: ACM Press, 2002: 613-619.
[3] Cheng Tao, Lauw W, Paparizos S. Entity synonyms for structured Web search[J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(10): 1862-1875.
[4] Plas L, Tiedemann J. Finding synonyms using automatic word alignment and measures of Distributional Similarity[C]//Proceedings of 44th Annual Meeting of the Association for Computational Linguistics. Sydney: Association for Computer Linguistics Press, 2006: 866-873.
[5] 陆勇, 侯汉清. 基于模式匹配的汉语同义词自动识别[J]. 情报学报, 2006, 25(6): 720-724.
[6] 吴云芳, 石静, 金澎. 基于图的同义词集自动获取方法[J]. 计算机研究与发展, 2011, 48(4): 610-616.
[7] Masato H, Yasuhiro O, Katsuhiko T. Supervised synonym acquisition using distributional features and syntactic patterns[J]. Information and Media Technologies, 2009, 4(2): 558-582.
[8] Kaji N, Kitsuregawa M. Using hidden Markov random fields to combine distributional and pattern-based word clustering[C]//Proceedings of the 22nd International Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics Press, 2008: 401-408.
[9] Weale T, Brew C, Fosler-Lussier E. Using the Wiktionary graph structure for synonym detection[C]//Proceedings of the 2009 Workshop on the People's Web Meets NLP.Stroudsburg: Association for Computational Linguistics Press, 2009: 28-31.
[10] Blondel V, Senellart P. Automatic extraction of synonyms in a dictionary[C]//Proceedings of the 2009 SIAM Workshop on Text Mining.Arlington: Springer Press, 2002:1-6.
[11] 陆勇,章成志,侯汉清,等.基于百科资源的多策略中文同义词自动抽取研究[J].中国图书馆学报,2010,36(1):56-62.
[12] 刘伟,黄小江,万小军,等.互联网环境下的英文同义术语自动发现研究与系统实现[J].图书情报工作,2012,56(22):26-31.
[13] 刘伟.互联网同义词搜索中的词义聚类问题研究[J].图书情报工作,2013,57(16):15-19.

文章导航

/