为弥补改进传统Web文本挖掘方法缺乏对文本语义理解的不足,采用本体与Web文本挖掘相结合的方法,探讨基于领域本体的Web文本挖掘方法。首先创建Web文本的本体结构,然后引入领域本体“概念-概念”相似度矩阵,并就概念间关系识别进行描述,最后给出Web文本挖掘的实现方法,发现Web文本信息的内涵。实验中以网络媒体报道为例,通过文本挖掘得出相关结论。
Abstract
The paper improved the traditional web text mining technology which can not understand the text semantics. The author discusses the web text mining methods based on the ontology,and sets up the web ontology structure at first, then introduces the “concept-concept” similarity matrix, and describs the relations among the concepts; puts forward the web text mining method at last. Based on text mining, the paper can find the potential information from the web pages. Finally, the author did a case study and drew some conclusion..
关键词
本体 /
Web文本挖掘 /
领域本体
{{custom_keyword}} /
Key words
ontology /
Web text mining /
domain ontology
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 韩家炜, 孟小峰, 王静, 等. Web 挖掘研究[J ] . 计算机研究与发展, 2000 , 37(5) : 513 - 520.
[2] Tom Gruber (1993). "A translation approach to portable ontology specifications". In: Knowledge Acquisition. 5: 199-199
[3] Fredrik Arvidsson and Annika Flycht-Eriksson. Ontologies I. Retrieved 26 Nov 2008.
[4]RESNIK P.Using information content to evaluate semantic similarity[C]//Proceedings of the 14th International Joint Conference on Artificial Intelligence. Montereal, 1995: 448- 453.
[5] JIANG J, CONRATH D.Semantic similarity based on corpus statistics and lexical taxonomy[C]//Proceeding of International Conference on Research on Computational Linguistics, Taiwan, 1997.
[6] 吴健,吴朝晖,李莹等. 基于本体论和词汇语义相似度的Web 服务发现[J], 计算机学报, 2005 年第28 卷 第4 期,595-602.
[7] Feldman ,R. and Dagan , I. Knowledge discovery in textual databases (KDT) [ Z] . In :proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD - 95) .Montreal ,Canada. August 20 - 21 ,1995 :112 - 117.
[8] 艾伟,孙四明,张峰. 基于本体的Web 文本挖掘与信息检索[J], 计算机工程, 2010年, 第36 卷 第22 期
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
2010上海市哲学社会科学规划课题一般项目
{{custom_fund}}