收稿日期: 2014-10-10
修回日期: 2014-11-20
网络出版日期: 2014-12-05
基金资助
本文系国家社会科学基金重大项目"基于特定领域的网络资源知识组织与导航机制研究"(项目编号:12&ZD222)和教育部人文社会科学研究青年基金项目"面向轻博客热点话题情感倾向性分析的研究"(项目编号:12YJC870023)研究成果之一.
Research on Semantic Relatedness of Domain-specific Concepts Based on Chinese Wikipedia
Received date: 2014-10-10
Revised date: 2014-11-20
Online published: 2014-12-05
王娟 , 曹树金 , 姜灵敏 , 胡青 . 基于中文维基百科的领域概念相关性研究[J]. 图书情报工作, 2014 , 58(23) : 136 -142 . DOI: 10.13266/j.issn.0252-3116.2014.23.021
In order to improve the accuracy of computing the relatedness of the domain-specific concepts, this paper proposes a new semantic relatedness algorithm using Chinese Wikipedia category architecture and concept interpretation content. The concepts in library and information science in concept-hierarchy of Chinese Wikipedia are taken as experiment objects, and weighted algorithm based on category and text information are compared with other algorithms only based on Chinese Wikipedia category like Relwup and Relseco or on Chinese Wikipedia article like Relstr. The experimental results show that the weighted algorithm is better than the others, and provide important technical support for application such as domain-oriented information retrieval, construction of domain ontology and so on.
[1] Jiang J J, Conrath D W. Semantic similarity based on corpus statistics and lexical taxonomy[C]//Proceedings of International Conference Research on Computational Linguistics. Taipei: Association for Computational Linguistics,1997:13-33.
[2] Church K, Hanks P. Word association norms, mutual information, and lexicography[J]. Computational Linguistics, 1990,16(1):22-29.
[3] Cilibrasi R L, Vitanyi P M B. The Google similarity distance[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3):370-383.
[4] Landauer T K, Foltz P W, Laham D. An introduction to latent semantic analysis[J]. Discourse Processes, 1998, 25(2/3): 259-284.
[5] Fellbaum C. WordNet: An electronic lexical database[M]. Cambridge: MIT Press, 1998:18-19.
[6] Jarmasz M, Szpakowicz S. Roget's thesaurus and semantic similarity[C]//Proceedings of RANLP. Borovets, Bulgaria:Association for Computational Linguistics, 2003:212-219.
[7] 刘群, 李素建. 基于《知网》的词汇语义相似度计算[J]. 中文计算语言学, 2002,7(2):59-76.
[8] 田久乐, 赵蔚. 基于同义词词林的词语相似度计算方法[J]. 吉林大学学报(信息科学版), 2010,28(6):602-608.
[9] Strube M, Ponzetto S P. WikiRelate! Computing semantic relatedness using Wikipedia[C]//Proceedings of AAAI. Boston: American Association for Artificial Intelligence, 2006: 1419-1424.
[10] Gabrilovich E, Markovitch S. Computing semantic relatedness using Wikipedia-based explicit semantic analysis[C]// Proceedings of IJCAI. Hyderabad, India:American Association for Artificial Intelligence, 2007:1606-1611.
[11] Zesch T, Gurevych I. Analysis of the Wikipedia category graph for NLP applications[C]//Proceedings of TextGraphs-2 Workshop NAACL-HLT. Rochester:Association for Computational Linguistics, 2007:1-8.
[12] Milne D, Witten I H. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links[C]//Proceedings of AAAI Workshop on Wikipedia and Artificial Intelligence. Chicago:American Association for Artificial Intelligence, 2008: 25-3.
[13] Halavais A, Lackaff D. An analysis of topical coverage of Wikipedia[J]. Journal of Computer-Mediated Communication, 2008,13(2): 429-440.
[14] 维基媒体基金会. 特殊页面: 统计信息查阅[EB/OL]. [2014-04-09]. http://zh.wikipedia.org/wiki/Wikipedia.
[15] 李赟. 基于中文维基百科的语义知识挖掘相关研究[D]. 北京:北京邮电大学, 2009.
[16] 汪祥. 基于中文维基百科的语义相关度计算的研究与实现[D]. 长沙:国防科学技术大学,2011.
[17] 涂新辉, 张红春, 周琨峰,等. 中文维基百科的结构化信息抽取及词语相关度计算方法[J]. 中文信息学报, 2012, 26(3):109-115.
[18] Ponzetto S P, Strube M. WikiTaxonomy: A large scale knowledge resource[C]//Proceedings of ECAI. Patras:European Coordinating Committee for AI, 2008:751-752.
[19] Rada R, Mili H, Bicknell E, et al. Development and application of a metric to semantic nets[J]. IEEE Transactions on Systems, Man and Cybermetics, 1989,19(1):17-30.
[20] Wu Zhibiao, Palmer M. Verb semantics and lexical selection[C]//Proceedings of ACL. Las Cruces:Association for Computational Linguistics, 1994:133-138.
[21] Resnik P. Using information content to evaluate semantic similarity[C]//Proceedings of the IJCAI. Montreal:American Association for Artificial Intelligence, 1995: 448-453.
[22] Seco N, Veale T, Hayes J. An intrinsic information content metric for semantic similarity in WordNet[C]//Proceedings of ECAI. Valencia:European Coordinating Committee for AI, 2004:1089-1090.
[23] Lesk M. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone[C]//Proceedings of 5th Annual International Conference on Systems Documentation. Toronto:Association of Computing Machinery, 1986:24-26.
[24] Banerjee S, Pedersen T. Extended gloss overlap as a measure of semantic relatedness[C]//Proceedings of IJCAI. Acapulco:American Association for Artificial Intelligence, 2003:805-810.
[25] 维基百科.分类:页面分类[EB/OL]. [2014-04-09]. http://zh.wikipedia.org/wiki/Category:%E9%A0%81%E9%9D%A2%E5%88%86%E9%A1%9E.
[26] 张华平. ICTCLAS汉语分词系统[EB/OL]. [2014-04-09]. http://ictclas.nlpir.org.
[27] Budanitsky A, Hirst G. Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures[C]//Proeeding of NAACL Workshop on WordNet and Other Lexical, Pittsburgh:Association for Computational Linguistics, 2001:29-34.
[28] Spearman C. "General Intelligence" objectively determined and measured[J]. The American Journal of Psychology, 1904,15(2):201-293.
/
〈 |
|
〉 |