针对用户对索引项要求的不同提出改进余弦向量度量法(ICVMM)文本检索模型,该模型将索引项分为主索引项和特征索引项,根据查询相关文本集中特征索引项相关性概率值来修改文本和查询特征索引项的初始权值;通过实例对传统余弦向量度量法(TCVMM)文本检索模型和ICVMM文本检索模型的查询效率进行对比,说明ICVMM文本检索模型的查询结果更接近用户的需求。
Abstract
According to different requirements of the users for indexing terms, the paper proposes an improved cosine vector measuring method(ICVMM) of text retrieval model. The model divides the indexing terms into main indexing term and characteristics indexing terms,modifies the initial weight values of characteristics indexing terms of texts and query based on the correlation probability values of characteristics indexing terms of the query related texts set. It compares the query effective of TCVMM text retrieval model and ICVMM text retrieval model with examples, and indicates that results of ICVMM text retrieval model are closer to the needs of users.
关键词
ICVMM文本检索模型 /
相关性概率值 /
权值向量 /
主索引项 /
特征索引项
{{custom_keyword}} /
Key words
ICVMM text retrieval model /
correlation probability value /
weight vector /
main indexing term /
characteristics indexing term
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Hassan S, Banea C, Random-Walk Term Weighting for Improved Text Classification [C]/ / Proceedings of Text Graphs:2nd Workshop on Graph Based Methods for Natural Language Processing , ACL,2006:55-58
[2]焦玉英,温有奎,陆伟.信息检索新论[M].武汉:武汉大学出版社,2008:72-73
[3]王众托,吴江宁,郭崇慧.信息与知识管理[M].北京:电子工业出版社,2010:162-165
[4] BO-Y KANG, DAE-WON KIM, SANG-JO LEE. Semantic indexing and fuzzy relevance model in information retrieval [J].Studies in Computational Intelligence (SCI), 2005(2)
[5] DIEGO PUPPIN, FABRIZIO SILVESTRI, DOMENICO LAFORENZA. Query-driven document partitioning and collection selection[C]//Proceedings of the 1 st international conference on scalable information systems InfoScale ’06,2006
[6] Cimiano P, Staab S. Learning concept hierarchies from text with a guided agglomerative clustering algorithm[C]//Proc of the ICML 2005 Workshop on Learning and Extending Lexical Ontologies with Machine Learning Methods. New York:ACM,2005
[7] Lafferty,J.,& Zhai,C. Document language models,query models,and risk minimization for information retrieval[C]//Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.ACM,2001
[8] 刘磊,曹存根,张春霞,田国刚.概念空间中上下位关系的意义识别研究[J].计算机学报,2009,32(8)
[9]Ricardo Baeza-Yates,Berthier Ribeiro-Neto.
Modern Information Retrieval[M].北京:机械工业出版社,2004.2
[10] 张爱华,靖红芳,王斌,徐燕.文本分类中特征权重因子的作用研究[J].中文信息学报.2010,24(3)
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}